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ABSTRACT: 

An image encoding/decoding apparatus for 
performing transform coding by a 

method in which blocking artifacts are suppressed 
or eliminated is disclosed in 

which encoded data, transmitted by the encoding 
apparatus, is converted into 

received image data terms which are subsequently 
overlap transformed into 

frequency coefficients for modification by means of 
a filtering operation 

utilizing a quantization error matrix. The 
quantization error matrix can be 

derived from quantization error data generated in 
the encoding unit, or can be 

provided as a look-up table in the decoding unit. 
The modified frequency 

coefficients are converted into reduced-noise image 
data terms for 
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reconstruction into a digital image. 
52 Claims, 5 Drawing figures 
Exemplary Claim Number: 1 
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Claims Text - CLTX (14): 

converting said modified coefficient matrices 
into filtered coefficient 

matrices by means of said quantization error matrix 
and said filter parameters, 

said modified coefficient matrix comprising terms 
denoted by Sf*sub.s,r 

( . upsilon . , .mu . ) * said step of converting said 
modified coefficient matrices 

performed in accordance with the equation, 
##EQU19## transforming said filtered 
coefficient matrices into filtered image-data 
matrices in accordance with the 
transform equation, 

Claims Text - CLTX (40) : 

transforming said overlapped image-data matrices 
into modified coefficient 

matrices comprising terms denoted by Sv.sub.s,r 
( .upsilon . , .mu . ) , by means of 
an orthogonal transform basis matrix C in 
accordance with the matrix equation, 
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A new, fast, and efficient image codec 
based on set partitioning in hierarchical 
trees 
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Abstract: 

Embedded zerotree wavelet (EZW) coding, introduced by Shapiro 
(see IEEE Trans. Signal Processing, vol.41, no. 12, p. 3445, 1993), 
is a very effective and computationally simple technique for image 
compression. We offer an alternative explanation of the principles 
of its operation, so that the reasons for its excellent performance 
can be better understood. These principles are partial ordering by 
magnitude with a set partitioning sorting algorithm, ordered bit 
plane transmission, and exploitation of self-similarity across 
different scales of an image wavelet transform. Moreover, we 
present a new and different implementation based on set 
partitioning in hierarchical trees (SPIHT), which provides even 
better performance than our previously reported extension of EZW 
that surpassed the performance of the original EZW. The image 
coding results, calculated from actual file sizes and images 
reconstructed by the decoding algorithm, are either comparable to 
or surpass previous results obtained through much more 
sophisticated and computationally complex methods. In addition, 
the new coding and decoding procedures are extremely fast, and 
they can be made even faster, with only small loss in 
performance, by omitting entropy coding of the bit stream by the 
arithmetic code 



Index Terms: 

arithmetic codes codecs data compression entropy codes image 
coding image reconstruction transform coding trees (mathematics) 
wavelet transforms arithmetic code decoding decoding algorithm 
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A New, Fast, and Efficient Image Codec Based 
on Set Partitioning in Hierarchical Trees 

Amir Said, Member, IEEE, and William A. Pearlman, Senior Member, IEEE 



Abstract — Embedded zerotree wavelet (EZW) coding, intro- 
duced by J. M. Shapiro, is a very effective and computationally 
simple technique for image compression. Here we offer an alter- 
native explanation of the principles of its operation, so that the 
reasons for its excellent performance can be better understood. 
These principles are partial ordering by magnitude with a set 
partitioning sorting algorithm, ordered bit plane transmission, 
and exploitation of self-similarity across different scales of an 
image wavelet transform. Moreover, we present a new and 
different implementation based on set partitioning in hierarchical 
trees (SPIHT), which provides even better performance than 
our previously reported extension of EZW that surpassed the 
performance of the original EZW. The image coding results, 
calculated from actual file sizes and images reconstructed by 
the decoding algorithm, are either comparable to or surpass 
previous results obtained through much more sophisticated and 
computationally complex methods. In addition, the new coding 
. and decoding procedures are extremely fast, and they can be 
made even faster, with only small loss in performance, by omitting 
entropy coding of the bit stream by arithmetic code. 

I. Introduction 

IMAGE compression techniques, especially nonreversible 
or lossy ones, have been known to grow computationally 
more complex as they grow more efficient, confirming the 
tenets of source coding theorems in information theory that 
a code for a (stationary) source approaches optimality in the 
limit' of infinite computation (source length). Notwithstand- 
ing, the image coding technique called embedded zerotree 
wavelet (EZW), introduced by Shapiro [1], interrupted the 
simultaneous progression of efficiency and complexity. This 
technique not only was competitive in performance with the 
most complex techniques, but was extremely fast in execution 
and produced an embedded bit stream. With an embedded 
bit stream, the reception, of code bits can be stopped at any 
point and the image can be decompressed and reconstructed. 
Following that significant work, we developed an alternative 
exposition of the underlying principles of the EZW technique 
and presented an extension that achieved even better results 
[6]. 

In this article, we again explain that the EZW technique is 
based on three concepts: 1) partial ordering of the transformed 
image elements by magnitude, with transmission of order by a 
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subset partitioning algorithm that is duplicated at the decoder, 
2) ordered bit plane transmission of refinement bits, and 3) ex- 
ploitation of me self-similarity of the image wavelet transform 
across different scales. As to be explained, the partial ordering 
is a result of comparison of transform element (coefficient) 
magnitudes to a set of octavely decreasing thresholds. We say 
that an element is significant or insignificant with respect to a 
given threshold, depending on whether or not it exceeds that 
threshold. 

In this work, crucial parts of the coding process — the way 
subsets of coefficients are partitioned and how the significance 
information is conveyed — are fundamentally different from the 
aforementioned works. In the previous works, arithmetic cod- 
ing of the bit streams was essential to compress the ordering 
information as conveyed, by the results of the significance 
tests. Here, the subset partitioning is so effective and the 
significance information so compact that even binary uncoded 
transmission achieves about the same or better performance 
than in these previous works. Moreover, the utilization of 
arithmetic coding can reduce the mean squared error or in- 
crease the peak signal-to-noise ratio (PSNR) by 0.3-0.6 dB 
for the same rate or compressed file size and achieve results 
which are equal to or superior to any previously reported, 
regardless . of complexity. Execution times are also reported 
to indicate the rapid speed of the encoding and decoding 
algorithms. The transmitted code or compressed image file 
is completely embedded, so that a single file for an image 
at a given code rate can be truncated at various points and 
decoded to give a series of reconstructed images at lower 
rates. Previous versions [1], [6] could not give their best 
performance with a single embedded file and required, for each 
rate,' the optimization of a certain parameter. The new method 
solves this problem by changing the transmission priority and 
yields, with one embedded file, its top performance for all 
rates. 

The encoding algorithms can be stopped at any compressed 
file size or let run until the compressed file is a representation 
of a nearly lossless image. We say nearly lossless because the 
compression may not be reversible, as the wavelet transform 
filters, chosen for lossy coding, have noninteger tap weights 
and produce noninteger transform coefficients, which are trun- 
cated to finite precision. For perfectly reversible compression, 
one must use an integer multiresolution transform, such as the 
S+P transform introduced in [14], which yields excellent re- 
versible compression results when used with the new extended 
EZW techniques. 
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This paper is organized as follows. The next section, 
Section II, describes an embedded coding or progressive 
transmission scheme that prioritizes the code bits according, 
to their reduction in distortion. Section III explains the 
. principles of partial ordering by coefficient magnitude and 
ordered bit plane transmission, which suggest a basis for 
an efficient coding method. The set partitioning sorting 
procedure and spatial orientation trees (called .zerotrees 
previously) are detailed in Sections IV and V, respectively. 
Using the principles set forth in the previous sections, 
the coding and decoding algorithms are fully described -in 
Section VI. In Section VII, rate, distortion, and execution 
time results are reported on the operation of the coding 
algorithm on test images and the decoding algorithm on 
the resultant compressed files. The figures on rate are 
calculated from actual compressed file sizes and on mean 
squared error or PSNR from the reconstructed images given 
by the decoding algorithm. Some reconstructed images are 
also displayed. These^ results are put into perspective by 
comparison to previous work. The conclusion of the paper 
is in Section VIII. 

H. Progressive Image Transmission 

We assume that the original image is defined by a set of pixel 
values pi t j, where (i, j) is the pixel coordinate. To simplify 
the notation we represent two-dimensional (2-D) arrays with 
bold letters. The coding is actually done to the array 

c = fi(p) (1) 

where 0( ) represents a unitary hierarchical subband transfor- 
mation (e.g., [4]). The 2-D array c has the same dimensions 
of p, and each element. d y j is called transform coefficient 
at coordinate (i, j). For the purpose of coding, we assume 
that each j is represented with a fixed-point binary format, 
with a small number of bits — typically 16 or less — and can 
be treated as an integer. 

In a progressive transmission scheme, the decoder initially 
sets the reconstruction vector c to zero and updates its com- 
ponents according to the coded message. After receiving the 
value (approximate or exact) of some coefficients, the decoder 
can obtain a reconstructed image 

p = n- 1 (c). (2) 

A major objective in a progressive transmission scheme is 
to select the most important information— which yields the 
largest distortion reduction — to be transmitted first. For this 
selection, we use the mean squared-error (MSE) distortion 
measure 

a-(p - p) = 4ee <** - 

. . .. ' (3) 

where N is the number of image pixels. Furthermore, we use 
the fact that the .Euclidean norm is invariant to the unitary 
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Fig. 1. Binary representation of the magnitude-ordered coefficients. 

transformation Q, i.e., 

Anse(i> " P) = Anaa(c.^-£) = ^ ^ ~ ^ ' 

(4) 

From (4) it is clear that if the exact value of the transform . 
coefficient aj is sent to the decoder, men the MSE decreases 
by |ci ( j| 2 /W. This means that the coefficients with larger 
magnitude should be transmitted first because they have a ? 
larger content of information. 1 This corresponds to the pro-' 
gressive transmission method proposed by DeVbre ei dl [3-]. 
Extending their approach, . we can see that the information 
in the value of \cij\ can also be ranked according to its 
binary representation, and the most significant bits should be 
transmitted first This idea is used, for example, in the bit-plane 
method for progressive transmission [2]. 

Following, we present a progressive transmission scheme 
that incorporates these two concepts: ordering the coefficients 
by magnitude and transmitting the most significant bits first. 
To simplify the exposition, we first assume that the ordering 
information is explicitly transmitted to the decoder; Later, we 
show a much more efficient method to code the ordering 
information. 

EI. TRANSMISSION OF THE COEFnOENT VALUES 

Let iis assume that the coefficients are ordered according 
to the miiiimum number of bits required for its magnitude 
binary representation, that is, ordered according to a one-to- 
one mapping r\ : i" t-» I 2 , such that * 

U°g2 K(jb)|J > Llog2 |cr 7 (fc-fi)U>. k = • • • , N. (5) 

Fig. 1 shows the schematic binary representation of a list 
of magnitude-ordered coefficients. Each column k in Fig. 1 
contains the bits of c^y. The bits in the top row indicate the 
sign of the coefficient. The rows are numbered from the bottom 
up, and the bits, in the lowest row are the least significant. 

Now, let us assume that, besides the ordering information, 
the . decoder also receives the numbers (Xn corresponding to 
the number of coefficients such that 2 n < \a t j\ < 2 n+1 . In 
the example of Pig. 1 we have Ms — 2*. -1*4 — % >3 = 4, etc. 
Since the transformation Q is unitary, all bits in a row have 
the same content of information, and the most effective ,'order 
for progressive transmission is to sequentially send the . bits 
in eacn row, as indicated by the arrows in Fig. 1. Note that/ 

^lere the term information is used to indicate how much the distortion can 
decrease. after receiving that part of the' coded message. 
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because the coefficients are in decreasing order of magnitude, 
the leading "0" bits and the first "1" of any column do not 
need to be transmitted, since they can be inferred from /i n 
and the ordering. 

The progressive transmission method outlined above can be 
implemented with the following algorithm to be used by the 
encoder. 

Algorithm I: 

1) output n = Ll°g2( raax (t,»{l c i,j|})J t0 the decoder; 

2) output /x n , followed by the pixel coordinates Tj(k) and 
sign of each of the fi n coefficients such, that 2 n < 
|cr,(fc)| < 2 n+1 (sorting pass); 

3) output the nth most significant bit of all the coefficients 
with |cj, j| > 2 n+1 (i.e., those that had their coordinates 
transmitted in previous sorting passes), in the same order 
used to send the coordinates (refinement pass); 

4) decrement n by one, and go to Step 2). 

The algorithm stops at the desired rate or distortion. Nor- 
mally, good quality images can be recovered after a relatively 
small fraction of the pixel coordinates are transmitted. 

The fact that this coding algorithm uses uniform scalar 
quantization may give the impression that it must be much 
inferior to other methods that use nonuniform and/or vector 
quantization. However, this is not the case: the ordering infor- 
mation makes this simple quantization method very efficient. 
On the other hand, a large fraction of the "bit-budget" is spent 
in the sorting pass, and it is there that the sophisticated coding 
methods are needed. 

IV. Set Partitioning Sorting Algorithm 

One of the main features of the proposed coding method is 
that the ordering data is not explicitly transmitted. Instead, it 
is based on the fact that the execution path of any algorithm 
is defined by the results of the comparisons on its branching 
points. So, if the encoder and decoder have the same sort- 
ing algorithm, then the decoder can duplicate the encoder's 
execution path if it receives the results of the magnitude 
comparisons, and the ordering information can be recovered 
from the execution path. 

One important fact used in the design of the sorting algo- 
rithm is that we do not need to sort all coefficients. Actually, 
we need an algorithm that simply selects the coefficients such 
that 2 n < \cij\ < ,2 n+1 , with n decremented in each pass. 
Given ra, if* > 2 n then we say that a coefficient is 

significant; otherwise it is called insignificant. 

The sorting algorithm jdivides the set of pixels into parti- 
tioning subsets T m and performs the magnitude test 



max {k J |}>2"7 

(«,j)€T m 



(6) 



If the decoder receives a "no" to that, answer (the subset is 
insignificant), then it knows that all coefficients in T m are 
insignificant. If the answer is "yes" (the subset is significant), 
then a certain rule shared by the encoder and the decoder 
is used to partition T m into new subsets T m j t and the 
significance test is then applied to the new subsets. This set 



division process continues until the magnitude test is done to 
all single coordinate significant subsets in order to identify 
each significant coefficient. 

To reduce the number of magnitude comparisons (message 
bits) we define a set, partitioning rule that uses an expected 
ordering in the hierarchy defined by the subband pyramid. The 
objective is to create new partitions such that subsets expected 
to be insignificant contain a large number of elements, and 
subsets expected to be significant contain only one element. 

To make clear the relationship between magnitude compar- 
isons and message bits, we use the function 



lo, 



max Uajl) > 2", 
otherwise 



(7) 



to indicate the significance of a set of coordinates T. To 
simplify the notation of single pixel sets, we write S n ({(i, j)}) 
as S n (i r j). 

V. Spatial Orientation Trees 

Normally, most of an image* s energy is concentrated in 
the low frequency components. Consequently, the variance 
decreases as we move from the highest to the lowest levels 
of the subband pyramid. Furthermore, it has been -observed 
that there is a spatial self-similarity between subbands, and 
the coefficients are expected to be better magnitude-ordered 
if we .move downward in the pyramid following the same 
spatial orientation,. [Note the mild requirements for ordering 
in (5).] For instance, large low-activity areas are expected 
to be identified in the highest levels of the pyramid, and 
they are replicated in the lower levels at the same spatial 
locations. 

A tree structure, called spatial orientation tree, naturally 
defines the spatial relationship on the hierarchical pyramid. 
Fig. 2 shows how our spatial orientation tree is defined in 
a pyramid constructed with recursive four-subband splitting. 
Each node of the tree corresponds to a pixel and is identified 
by the pixel coordinate. Its direct descendants (offspring) 
correspond to the pixels of the same spatial orientation in the 
next finer level of the pyramid. The tree is defined in such a 
way that each node has either no offspring (the leaves) or four 
offspring, which always form a group of 2x2 adjacent pixels. 
In Fig. 2, the arrows are oriented from the parent node to its 
four offspring. The pixelsln the highest level of the pyramid 
are the tree roots and are also grouped in 2 x 2 adjacent pixels. 
However, their offspring branching rule is different, and in 
each group, one. of them (indicated by the star in Fig. 2) has 
no descendants. 

The following sets of coordinates are used to present the 
new coding method: 

• 0(h j)'- set of coordinates of all offspring of node (i, j); 

• 7>(i 1 j): set of coordinates of all descendants of the node 

• 7i: set of coordinates of all spatial orientation tree roots 
(nodes in the highest pyramid level); 

• C(i,j) = V{i,j) -0(i,j). 
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Fig. 2. Examples of parent-offspring dependencies in the spatial-orientation 
tree. 

For instance, ' except at the highest and lowest pyramid 
levels, we have 

= {(2i, 2j), (2t, 2j + 1), (2* + 1, 2j), 

(2i + 1, 2j + 1)}. . (8) 

We use parts of the spatial orientation trees as the partition- 
ing subsets in the sorting algorithm. The set partitioning rules 
are simply the following. . ' * 

1) The initial partition is formed with the sets {(£, j)} and 
. V(i 7 j), for all (i, j) e U. 

2) If V(i, j) is significant, then it is partitioned into j) 
plus the four single-element sets with* (A;, I) G 0(i, j). 

3) If £(%> j) -is significant, then it is partitioned into the 
four sets /), with (A;, /) e 0(i\ j). 

- . ' ■ VI. Coding Algorithm 

Since the order in which the subsets are tested for signif- 
icance is important, in a practical implementation the signif- 
icance information is stored in three ordered lists, called list 
of insignificant sets (LIS), list of insignificant pixels (LIP), 
and list of significant pixels (LSP). In all lists each entry is 
identified by a coordinate, (i, j), which in the LIP and LSP 
represents individual pixels, and in the LIS represents either 
the set 2>(i, j) or £(z> j). To differentiate between them, we 
say that a LIS entry is of type A if it represents 2?(i, j), and 
of type B if it represents C(i y j). 

During the sorting pass (see Algorithm I), the pixels in 
the LIP — which were insignificant in the previous pass — are 
tested, and those that become: significant' are moved to the 
LSP. Similarly, sets are sequentially evaluated following the 
LIS order, and when a set is' found to be significant it is 
removed from the list and partitioned. The new subsets with 
more than one element are added back to the LIS, while the 
single-coordinate sets are added to the end of the LIP or the 
LSP, depending whether they are insignificant or significant, 
respectively. The LSP contains the coordinates of the pixels 
that are visited in the refinement pass. 

Below we present the new encoding algorithm in its entirety. 
It is essentially equal to Algorithm I, but uses the set- 
partitioning approach in its sorting pass. 



Algorithm II: 

1) Initialization: output n = L^62 ( ma ^(t ) j){l G »,jl})}i 
set the LSP as an empty list, and add the coordinates 
(i 7 j) e % to the LIP, and only those with descendants 
also to the LIS, as type A entries. 

2) Sorting Pass: 
\ 2.1) for each entry (i, j) in the LIP do: 

"2.1.1) output S n (i,j); 
2.1.2) if S n (i y j) = 1 then move (i y j) to the LSP 
and output the sign of c», ; -; v 
2.2) for each entry (i, j) in the LIS do: 

2.2.1) if the entry is of type A then 

• output S n {V(i, j)); 

• if 5„(P(i,i)).==.'l'thea . - ./ . 

* for each (£, I) € <D(.i : j) do: 
•■ output S n (k } I); . 

♦ if S„(A, 0 = 1 then add (fe, I) to the 
LSP and output the sign of Ck,tl ■ 

• if S n (&, 0 = 0 then" add J) to the 
< end of the LIP; 

* if j) ^ 0;then move (it, j) to. the. 
end of the LIS, as an entry of type B, 
and go to Step 2.2.2); otherwise, remove 
entry (i, j) from the. LIS; 

2.2.2) if the entry ^is of type B then 

• output S n (£(i t .j)); 

• if S n (C(i t j)) = 1 then /';''" 

* add each (fc, l) G Q(i } j) to the end of 
the LIS as an entry of type A; 

* remove (i, j) from the LIS, 

3) Refinement Pass: for each entry {i } j) in the LSP, 
except those included in the last sorting pass (i.e., with 
same n), output the nth most significant bit of |ct ;j |; 

4) Quantization-Step Update: decrement n by 1 and go 
to Step 2. ' : * " 

One important characteristic of the algorithm 'is that the 
entries added to the end of the LIS in Step 2^2) are evaluated . 
before that same sorting pass ends. So, when we say "for each 
entry in the LIS" we also mean those that are. being added tq its 
end. With Algorithm II, the rate can be precisely controlled be- 
cause the transmitted information is formed of single bits. The 
encoder can also use the property in (4) to estimate the progres- 
sive distortion reduction and stop at a desired distortion value. 

Note that in Algorithm n, all branching conditions based on 
the significance data. S n — which can only be calculated witli 
the knowledge of c^j— are Output by the encoder. Thus, to 
obtain the desired decoders algorithm, which duplicates the 
encoder's execution path as it sorts the significant coefficients^ 
'we simply have to replace the words output' by input, in 
Algorithm n. Comparing the algorithm above to Algorithm 
I,, we can see that the ordering, information ^(k) is recovered 
when the coordinates of the significant coefficients are added 
to the end of the LSP; that is, the coefficients pointed by 
the coordinates in the LSP are sorted as in (5). But note: that 
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Fig. 3. Comparative evaluation of the new coding method. 
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whenever the decoder inputs data, its three control lists (LIS, 
LIP, and LSP) are identical to the ones used by the encoder at 
the moment it outputs that data, which means that the decoder 
indeed recovers the ordering from the execution path. It is 
easy to see that with this scheme, coding and decoding have 
the same computational complexity. 

An additional task done by decoder is to update the re- 
constructed image. For the value of n when a coordinate is 
moved to the LSP, it is known that 2 n < \a^\ < 2 n+1 . 
So; the decoder uses that information, plus the sign bit that 
is input just after the insertion in the LSP, to set Cij = 
±1.5 x 2 n . Similarly, during the refinement pass, the decoder 
adds or subtracts 2 n_1 to Cij when it inputs the bits of the 
binary representation of \ci,j\. In this manner, the distortion 
gradually decreases during both the sorting and refinement 
passes. 

As with any other coding method, the efficiency of 
Algorithm II can be improved by entropy-coding its output, 
but at the expense of a larger coding/decoding time. Practical 
experiments have shown that normally there is little to 
be gained by entropy-coding the coefficient signs or the 
bits put out during the refinement pass. On the other 
hand, the significance values are not equally probable, 
and there is a statistical dependence between S n (i, j) and 



5 n [Z?(i, j)] and also between the significance of adjacent 
pixels. 

We exploited this dependence using the adaptive arithmetic 
coding algorithm of Witten et al [7]. To increase the coding 
efficiency, groups of 2 x 2 coordinates were kept together in 
the lists, and their significance values were coded as a single 
symbol by the arithmetic coding algorithm. Since the decoder 
only needs to know the transition from insignificant to signif- 
icant (the inverse is impossible), the amount of information 
that needs to be coded changes according to the number m of 
insignificant pixels in that group, and in each case it can be 
conveyed by an entropy-coding alphabet with 2 m symbols. 
With arithmetic coding it is straightforward to use several 
adaptive models [7], each with 2 m symbols, m e {1, 2, 3, 4}, 
to code the information in a group of four pixels. 

By coding the significance information together, the average 
bit rate corresponds to an mth order entropy. At the same 
time, by using different models for the different number of 
insignificant pixels, each adaptive model contains probabilities 
conditioned to the fact that a certain number of adjacent 
pixels are significant or insignificant This way the dependence 
between magnitudes of adjacent pixels is fully exploited. The 
scheme above was also used to code the significance of trees 
rooted in groups of 2 x 2 pixels. 
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(d) , . . .. : . 

Fig. 4. Images obtained with the arithmetic code version of the new coding method^ (a) Original LENA, (b) rate = 0.5 bpp, PSNR = 37.2 dB, (c) 
rate = 0.25 bpp, PSNR = 3*1 dB, (d) rate = 0.15 bpp, PSNR = 31.9 dB. 



TABLE I 

Effect of Entropy- Coding the Significance Information 
on the CPU Times (s) to Code and Decode the 
Image Lena 512 x 512 (IBM RS/6000 Workstation) 
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binary . 


arithmetic 


(bpp) 


uncoded^ 


coded 




code 


decode 


code 


decode 


0.25 


0.07 


• 0.04 


0.18 


0.14 


0.50 


0.H 


0.09 • 


0.33 


0.29 


1.00 


0.27 


0.17 


0.64 


0.57 



With arithmetic entropy-coding it is still possible to produce 
a coded file with the exact code rate and possibly a few unused 
bits to pad the file to the desired size. 

vn. Numerical Results 

The following results were obtained with . monochrome, 8 
bpp, 512 x 512 images. Practical tests have shown that the 
pyramid transformation does not have to be exactly unitary, 
so we* used five-level pyramids constructed "with the 9/7-tap. 
filters of [5], and using a "reflection" extension at the image 



edges. It is important to observe that trie bit rates are not 
entropy estimates — they were calculated' from the actual size 
of the compressed files; Furthermore, by using the.progressive 
transmission ability, the sets of distortions are obtained from 
the same file, that is, the decoder' read the first bytes of the 
file (up to the desired irate), calculated the inverse subband 
trarisformation, and then compared the recovered image with', 
the original. The distortion is measured by the peak, signal to 
noise ratio . , 



PSNR = 101og 10 (g)dB. 



. (9) 



where MSE -denotes the mean squared-error between the 
original and reconstructed images.; 

Results are obtained both with and without entropy-coding 
the bits put out with Algorithm 1 II. We call the version without 
entropy coding binary-uncoded. In Fig. 3 are plotted the PSNR 
versus rate obtained for the luminance (Y). component of 
LENA both for binary uncoded. and entropy^cbded using 
arithmetic code. Also in Fig. 3, the same is plotted for the 
luminance image GOLDHELL. The numerical results with . 
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arithmetic coding suipass in almost all respects the best 
efforts previously reported, despite their sophisticated and 
computationally complex algorithms (e.g., [5], [8]-[10], [13], 
[15]). Even the numbers obtained with the binary uncoded 
versions are superior to those in all these schemes, except 
possibly the arithmetic and entropy constrained trellis quan- 
tization (ACTCQ) method in [11]. PSNR versus rate points 
for competitive schemes, including the latter one, are also 
plotted in Fig. 3. The new results also surpass those in the 
original EZW [1] and are comparable to those for extended 
EZW in [6], which along with ACTCQ rely on arithmetic 
coding. The binary uncoded figures are only 0.3-0.6 dB lower 
in PSNR than the corresponding ones of the arithmetic coded 
versions, showing the efficiency of the partial ordering and 
set partitioning procedures. If one does not have access to 
the best CPU's and wishes to achieve the fastest execution, 
one could opt to omit arithmetic coding and suffer little 
consequence in PSNR degradation. Intermediary results can 
be obtained with, for example, Huffman entropy-coding. A 
recent work [12], which reports similar performance to our 
arithmetic coded ones at higher rates, uses arithmetic and 
trellis coded quantization (ACTCQ) with classification in 
wavelet subbands. However, at rates below about 0.5 bpp, 
ACTCQ is not as efficient and classification overhead is not 
insignificant. 

Note in Fig. 3 that in both PSNR curves for the image 
LENA there is an almost imperceptible "dip" near 0.7 bpp. 
It occurs when a sorting pass begins, or equivalently, a new 
bit-plane begins to be coded; and is due to a discontinuity 
in the slope of the rate x distortion curve. In previous EZW 
versions [1], [6], this "dip" is much more pronounced, of up 
to 1 dB PSNR, meaning that their embedded files did not yield 
their best results for all rates. Fig. 3 shows that the new version 
does not present the same problem. 

In Fig. 4, the original images are shown along with their cor- 
responding reconstructions by our method (arithmetic coded 
only) at 0.5, 0.25, and 0.15 bpp. There are no objectionable ar- 
tifacts, such as the blocking prevalent in JPEG-coded images, 
and even the lowest rate images show good visual quality. 
Table I shows the corresponding CPU times, excluding the 
time spent in the image transformation, for coding and decod- 
ing LENA. The pyramid transformation time was 0.2 s in an 
IBM RS/6000 workstation (model 590, 4 which is particularly 
efficient for floating-point operations). The programs were not 
optimized to a commercial application level, and these times 
are shown just to give an indication of the method's speed. 
The ratio between the coding/decoding times of the different 
versions can change for other CPU's, with a larger speed 
advantage for the binary-uncoded version. 

vm. Summary and Conclusions 

We have presented an algorithm that operates through set 
partitioning in hierarchical trees (SPIHT) and accomplishes 
completely embedded coding. This SPIHT algorithm uses the 
principles of partial ordering by magnitude, set partitioning 
by significance of magnitudes with respect to a sequence of 



octavely decreasing thresholds, ordered bit plane transmission, 
and self-similarity across scale in an image wavelet trans- 
form. The realization of these principles in matched coding 
and decoding algorithms is a new one and is shown to be 
more effective than in previous implementations of EZW 
coding. The image coding results in most cases surpass those 
reported previously on the same images, which use much 
more complex algorithms and do not possess the embedded 
coding property and precise rate control. The software and 
documentation, which are copyrighted and under patent ap- 
plications, may be accessed in the Internet site with URL 
http://ipl.rpi.edu /SPIHT or obtained by anonymous 
ftp to ipl.rpi.edu with the path pub/EW_Code in the 
compressed archive file code tree . tar . gz. (The file must 
be decompressed with the command gunzip and exploded 
with the command "tar xvf the instructions are in the 
file codetree.doc.)- We feel that the results of this coding 
algorithm with its embedded code and fast execution are so 
impressive that it is a serious candidate for standardization in 
future image compression systems. 
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Abstract: 

We introduce a new zerotree scheme that effectively exploits the 
inter-scale self-similarities found in the octave decomposition by a 
wavelet transform. A zerotree is useful to code wavelet 
coefficients and its effectiveness was proved by Shapiro's (1993) 
EZW (embedded zerotree wavelet). In the coding scheme, wavelet 
coefficients are symbolized and then entropy-coded. The entropy 
per symbol is determined from the produced symbols and the final 
coded size is calculated by multiplying the entropy and the total 
number of symbols. We analyze symbols produced from the EZW 
and discuss the entropy per symbol. Since the entropy depends 
on the produced symbols, we modify the procedure of symbol 
generation. First, we extend the relation between a parent and 
children used in the EZW to raise the probability such that a 
significant parent has significant children. The proposed relation is 
flexibly extended according to the fact that a significant coefficient 
is likely to have significant coefficients in its neighborhood. Our 
coding results are compared with the published results of Shapiro 
and improvements come from the use of lower entropy per 
symbol. We also give a comparison of the number of produced 
symbols 



Index Terms: 

data compression entropy codes image coding transform coding 
wavelet transforms EZW children embedded zerotree wavelet entropy 
per symbol entropy-coded coefficients image coding inter-scale 
self-similarities ' low entropy octave decomposition parent probability 
significant coefficients symbol generation wavelet coefficients wavelet 
transform zerotree based compression zerotree coding 
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ABSTRACT We introduce a new zerotree scheme that effectively 
exploits the inter-scale self-similaiiies found in the octave decomposition by 
a wavelet transform. A zerotree is useful to code wavelet coefficients and its 
effectiveness was proved by Shapiro's EZW. In the coding scheme, wavelet 
coefficients are symbolized and then entropy-coded. The entropy per 
symbol is determined from the produced symbols and the final coded size is 
calculated by multiplying the entropy and the total number of symbols. 

In this paper, we analyze symbols produced from the EZW and discuss 
the entropy per symbol. Since the entropy depends on the produced 
symbols, we modify the procedure of symbol generation. First, we extend 
the relation between a parent and children used in the EZW to raise the 
probability such that a significant parent has significant children. The 
proposed relation is flexibly extended according to the fact that a significant 
coefficient is likely to have significant coefficients in its neighborhood. 

Our coding results are compared with the published results in paper [1] 
and improvements come from the use of lower entropy per symbol. We also 
give the comparison of the number: of produced symbols. 

KEYWORD: image compression, wavelet transform, zero- 
tree coding 

L INTRODUCTION 

In the wavelet-based coding, dependencies among bands 
using quadtrees were exploited in EZW(Embedded Zerotree 
Wavelet) 111 , SPIHT(Set Partitioning In Hierarchical Trees) 121 , 
SFQ(Space Frequency Quantization) 13,41 ; that is, one coefficient 
at a given band is related with four coefficients at the same 
spatial location at the next finer band in terms of a relation 
between parent and children and the relation is applied for all 
coefficients except for DC coefficients. The EZW coder was 
designed by Shapiro who first applied an embedded zerotree 
using a wavelet. The algorithm is based on three concepts; 1) 
prediction of the absence of significant coefficients across scales 
by exploiting the self-simil;irity inherent in images 2) successive 
approximation for decoded coefficients 3) adaptive arithmetic 
coding for the streamed out symbols. After that, Said and 
Pearlman published their j'reat work - SPIHT - that gives nice 
performances and fast processing. They use three lists to find 
significant coefficients in bands; 1) an LIP for insignificant 
coefficients 2) an LSP for significant coefficients 3) an LIS for 
insignificant descendants. The LIS includes two kinds of 
information for the descendants at the forms of type A or type B. 
The three lists are identically duplicated in a decoder by the 
transmitted bit stream. In the EZW and SPIHT algorithms, their 
merit is on the terrninution ability at any point that an 
encoder/decoder wants to stop, and the decoder reconstructs an 
approximated image from the information he has received. This 
property is clearly desirable when we consider of our 
constrained communication channels. More recently, Z.Xiong 
et. al. published an SFQ iilgorithm that surpasses the EZW and 
SPIHT in performance and there are two versions according to 



wavelet decompositions. One of them uses the octave band 
wavelet and the other uses the wavelet-packet. They get a coding 
performance while pruning branches from trees in a rate-distortion 
sense and scalar-quantizing the coefficients at the survived nodes. 
The decision to prune a branch or not depends on the pre-assigned 
bit budget and comparing of costs between pruning and non- 
pruning. 

Most of coding schemes have two common procedures; 1) 
symbol generation (model transformation) and 2) entropy coding 
of the symbol stream. The symbol stream is produced for the 
purpose of representation and then symbols are entropy-coded. In 
this paper, we introduce a new zerotree scheme that lead lower 
entropy and thus more compression. Since the entropy per symbol 
is determined by the probabilities of produced symbols, we thus 
modify the procedure of the symbol generation with flexible 
treeing. The tree is flexibly designed in view of entropy. In the 
EZW scheme, a node on a tree branches out into four nodes and 
this relation is referred to as a fixed relation in the sense that the 
relation is not changed. On the other hand, our proposed relation 
is referred to as a "flexible tree" in the sense that a node on a tree 
branches into basic four nodes and flexibly extends its branches to 
nodes in neighbor. The idea to the flexible tree comes from how 
to extend more branches. 

IL ZEROTREE BASED COMPRESSION 
I. Embedded Zerotree Wavelet coding 

Jerome M. Shapiro [1] developed an algorithm that exploits a 
relation between subbands in image compression. In the 
algorithm, zerotrees have been combined with bit plane coding 
and demonstrate the effectiveness of wavelet based coding. The 
algorithm is based on the zerotrees that efficiently represent many 
insignificant coefficients. As wavelet coefficients are located 
having some dependencies in bands, the dependencies are well 
exploited with a quadtree structure. The compression has three 
step procedures; 1) wavelet decomposition 2) symbol generation 
3) entropy coding. We briefly review the coding algorithm and 
discuss produced symbols and its entropy. To describe the 
compression scheme, we quote several definitions - like parent, 
child, ancestor, descendant, root etc. - from the reference [1]. 

There are two types of passes performed: 1 ) a dominant pass 
2) and a subordinate pass. The dominant pass finds significant 
coefficients to a given threshold, and the subordinate pass refines 
all significant coefficients found in all previous dominant passes. 
We use four symbols to tell a dorninant pass to a decoder. A ZTR 
symbol is used for a zerotree root that is insignificant and has no 
significant descendants. One more needed symbol ia an Isolated 
Zero symbol (named IZ) used when a coefficient is insignificant 
but has some significant descendants. Besides the symbols, two 
symbols are used for a significant coefficient - POS and NEC 
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according to its sign. After all, the use of ZTR and XL symbols 
is to inform locations of significant coefficients (POS and NEG) 
as efficiently as possible. 

After a dominant pass, a subordinate pass is performed in 
order to refine the coefficients found to be significant in the 
previous dominant passes and these two passes are entropy- 
coded with an adaptive arithmetic coder 171 . 

2. The Shannon* s entropy 

As we reviewed in the previous section, a symbol stream is 
produced from the alternate passes and then the stream is 
entropy-coded for more compression. In this sub section, we 
briefly study the Shannon's entropy theorem to analyze the 
symbol stream. 

Let S - {sj, S2 s„) be a set of n symbols. Given D - f 

du dz, d\} % a data set of / symbols in a sequence (the number / 

is also called the data length of D\ the probability distribution 
of the symbol set S in the data D is the collection of positive 
numbers P = {pi, p 7 p n }> one for each symbol, defined by 

Pi = \{d k eD\d k = Si }\/l, fori- 1,2. - .«. (I) 
If the probability distribution is the only assumed redundancy 
information, the pair (SJ*) is called a zero-order Markov source. 
The data sequence D is called a zero-order Markov sequence. 

Using the above notations, the (zero-order) entropy of the 
data sequence D is defined to be 

e(D) = -Z Pi -log 3 Pi, (2) 

3. A relation between the number of symbols and its entropy 

With the Shannon's entropy, we consider a relation 
between the number of symbols and its entropy. The entropy 
per symbol largely depends upon occurrence probabilities of 
symbol alphabets and thus the final coded size is calculated by 
multiplying the average entropy and the number of symbols. 
Therefore, we can achieve more reduction in final coded size 
with two ways; one is to reduce the entropy per symbol and the 
other is to reduce the number of symbols. 

We assume two particular source models for this 
discussion. Both models are composed of two symbols (Si and 
S2) but the probability distributions and the numbers of symbols 
are different. Assume the first model has ten Si and ten S 2 . In 
this case, the probabilities of symbols and its entropy are 
calculated by using equations 1 and 2. The model is assumed to 
be uniformly distributed and the entropy is 1 bit/symbol. 
Therefore, we should use 20 bits for the model. On the other 
hand, we assume the second model that has five Si symbols and 
20-bit budget. In this case of the model, we consider how many 
Si symbols we can insert in the 20 bits. Using the equations 1 
and 2, we can insert, at least, 20 S2 symbols into the source. 
Therefore, the final output sizes are the same at 20 bits though 
they have different source length of symbols. It is important to 
compare the difference between the two distributions; in the 
second model, five Si has a worth of ten S2 if we consider only 
the number of symbols. If the replacement is accomplished 
without any deformation of the contents, our attention will go to 
the number of symbol to be replaced. When we accomplish the 
replacement in smaller number than two, we do expect more 



compression with less entropy although the total number of 
symbols are increased. This consideration is discussed in the next 
section with more detailed example and we will apply to the EZW 
by using our flexible tree structure. 
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Figure III - 1. An example to explain a difference between the 1-4 relation and 1-9 
relation. 

Table The possible numbers of ZTR according to the variable numbers of IZ for a 
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Entropy 
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POS 
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IZ 
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1 
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10 


10 


40 


2.000 


80 


2 


10 


10 


9 


12 


41 


1.992 


80 


3 


10 


10 


8 


13 


41 


1.978 


80 


4 


10 


10 


7 


14 


41 


1.958 


80 


5 


10 


10 


6 


15 


41 


1.929 


80 


6 


10 


10 


■ 5 


18 


43 


1.866 


80 


7 


10 


10 


4 


21 


45 


1.788 


80 


8 


10 


10 


3 


25 


48 


1.683 


80 
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10 


10 


2 


30 


52 


1.553 


80 


10 


10 


10 


1 


38 


59 


1.377 


80 




As reviewed in the previous section, a dominant pass in the 
EZW tells where significant coefficients with respect to a given 
threshold exist and which signs they have. In the pass, we use 
four symbols - ZTR, POS, NEG and IZ - to inform the locations 
and signs. Once an image is decomposed using a wavelet, the 
number of significant coefficients is decided. Therefore, it is the 
number of ZTR and IZ to decide length of a symbol stream and its 
entropy. We now consider the occurrence of these symbols. 
While a ZTR is produced when a coefficient and its descendants 
are insignificant, an IZ is produced when a coefficient is 
insignificant but some of descendants are significant. 

Assumed that the numbers of POS and NEG are fixed as ten 
respectively, table HT-1 shows relations in number between ZTR 
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and IZ for a targeted budget (80 bits) in the views of numerical 
and graphical charts. As we can see in the table, when the 
numbers of IZ are decreased at a rate of one symbol, see how 
many ZTR can be coded into a stream. For a clear comparison, 
we give our attention to only the cases 1 and 10. In the case 1, 
the number of IZ are ten and therefore ten ZTR can be coded for 
the target size (80 bits). On the other hand, there is only 1 IZ 
and thus 38 ZTR can be coded for the same target size in the 
case 10. Although their target sizes are the same, we can code 
the different number of symbols. Comparing with the case 1 , 
the case 10 can be interpret as the decreased nine IZ symbols 
are replaced with the increased 28 ZTR symbols. The ratio 28 / 
9 means that one IZ has the worth of 3.1 ZTR symbols. 
Therefore, we conclude a fact that it is more effective for an 
entropy coding to use three: ZTR rather than one IZ, if and only 
if possible. Our interests are then on a possibility to such a 
symbol replacement and a ratio in the replacement. 

To implement the symbol replacement, we have two step 
procedures; one is to decrease the number of IZ symbols and the 
other is to replace them with several ZTR symbols in order to 
compensate the decrease. We first suggest a solution to 
decrease IZ symbols. An insignificant coefficient is coded as an 
IZ when some descendants are significant. In other words, the 
occurrence of IZ is caused from one reason that the significant 
descendants belong to the insignificant ancestor. Therefore, a 
simple solution is to suppress the occurrence; let the significant 
descendants belong to a significant ancestor. It is only possible 
that the descendants have a power to select their ancestors. In 
the EZW, the relation does not allow such a selection and 
always maintains one pansnt to four children; this is referred as 
a fixed relation. The relalion can be modified with another form 
so that some children can select their parent. Selecting a parent 
means that there should be several candidates for the parent and 
we can imagine that a modified relation must have an 
overlapped form. It can be made in various forms. One of them 
is suggested in figure HI- 1 (a). Four parents are displayed in the 
parent level having their own shapes. These shapes help to 
understand the relations between parents and their children. 
Each parent has nine children respectively and some children 
are shared by several candidates to be their parent; that is, a 
child can belong to one or more parents. In other words, we can 
scan a child after a parent among all candidates. This is referred 
as a modified relation; one parent-nine children. 

We give a simple example to explain the modified relation. 
Assume that there are two parents at parent level - one is 
significant and the other is insignificant - and six significant 
children (named as CI to C6) in child level as shown in the 
figure m-1 (b) and (c). Our goal is to find all significant 
coefficients according to the scanning order that we do not scan 
any children before any parents. We will use two relations to 
find them; the fixed and the modified relations. We have seven 
coefficients - one parent and six children - to be found as 
significant coefficients. We first find them with the fixed 
relation as shown in figure IH-i (b). The significant parent P2 
has four significant children and they are scanned under P2. 
However, the insignificant parent PI has also two significant 
children; thus the parent should be symbolized with an IZ 
symbol to find these t«vo significant children. Therefore, we 



make a symbol stream in this case - IS (at parent level) SZSZ, 
SSSS (at child level); where S,I,Z mean a significant coefficient, 
an isolated zero and a zerotree root respectively. The stream has 
seven S, three Z and one I symbols. On the other hand, when the 
modified relation is applied to the example as shown in figure III- 
l (c), all significant children belong to one significant parent and 
thus we need no IZ symbol. In this case, the symbol stream is 
output as ZS (at parent level) ZZZSSSSSS (at child level); seven 
S and four Z symbols. As was shown in the above explanation, 
two symbol streams were obtained for the same example by using 
two different relations. We knew that a relation between a parent 
and its children plays an important role in producing a symbol 
stream. According to specific relations, the kind and the number 
of produced symbols are different and thus the resulting entropy is 
different. In the cases of (b) and (c), entropies are 1.157 
bit/symbol and 0.946 bit/symbol respectively. Their entropy 
coded sizes are 11.57 bits and 10.41 bits. Comparing those two 
streams, we conclude that one I symbol in the case (b) was 
replaced with two Z symbols in the case (c). After all, when we 
change a relation with another, an important thing is how many Z 
symbols are increased instead of decreasing I symbols. The ratio 
between the increase and decrease will be an important factor for 
an entropy coding. 

To decrease the ratio, we again change the modified 1-9 
relation with a flexible relation. In the previous example, the 1-9 
relation was more efficient than the fixed relation as no use of I 
symbols. However, that is only the special example to explain a 
relation between I and Z symbols in numbers. If the PI were also 
significant in the example, the symbol stream of the case (b) 
would not have included any I symbol and thus only two Z 
symbols are needed for the case. This means that the modified 
relation is not always better than the fixed relation is. Therefore, 
we need a general relation to compromise these two relations. 

We exploit the dependencies in neighboring coefficients for 
that purpose. This can be realized by using a flexible relation; 
that is, the number of children a parent has is variable at a bound 
between four and nine. To define the flexible relation, we divide 
nine children into four groups that are named as G1,G2,G3 and 
G4 as shown in figure EH-1 (d). A parent has the first group Gl 
and selectively has the rest groups of G2,G3 and G4; where each 
of rest groups - G2 ,G3 and G4 - is selected only when the first 
child in each group is significant. For example, G2 is selected 
when C2 is significant; in this case, the parent has Gl and G2 
groups and six children CI to C6 belong to the parent. Therefore, 
the Gl should be scanned before G2.G3 and G4. After all, 
selections of the rest groups G2,G3,G4 are determined by the 
significance of the children C2,C4,C5 in Gl. 

Back to the previous example, we apply the flexible relation. 
The first group Gl to the PI has no significant children and thus 
PI is coded as a zerotree root. The next parent P2 has two 
significant children CI and C2 in Gl; therefore, the children 
groups are G1.G3 and G4. In this case, the parent P2 has eight 
children except the second child of G2 among nine children. The 
resulting stream is ZS (at parent level) ZZSS (from Gl ) SS (from 
G3) SS (from G4); seven S and three Z symbols are included. 
Note that we do not need to scan a child twice. The flexible 
relation enables to decrease one more Z symbol than the modified 
relation. 
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IV. EXPERIMENTAL RESULTS 



Table IV- 1. Comparisons of produced symbols in numbers for the same 



Our flexible tree is designed to reduce the number of IZ 
symbols and thus let the entropy lower. The decreased IZ 
symbols induce some increase of ZTR symbols in numbers by 
defining an extended relation. The ratio between the decrease 
and increase is efficiently exploited with the flexible treeing. 
We use two standard images - Lenna and Barbara (512 X 512 
with a grey scaled level) - from the RPI site, 
ftpy/ipl.rpi.edu/pub/irnage/still/usc. Our all results are based on 
6-scaled octave wavelet transform and we use the 9/7 filter of 
[5] and mirror extensions at boundaries. Experimentally, the 
performances are compared with the published results at the 
reference [1] and they are plotted in fig IV-l (a) and (b) for the 
two images respectively. The performances in PSNR are 
calculated over the ranges from 256 to 32768 Bytes. Our 
flexible coder shows 0.2-0.7 dB better performances than the 
EZW coder. The improvements are based on the symbol 
replacements by which we use a frequent symbol (ZTR) instead 
of infrequent symbol (IZ) as many as possible. We know the 
replacements are well accomplished with the flexible relation as 
shown in the performance curves. 

In addition, we compare the number of symbols between the 
EZW and our coder. To give an exact comparison, we stop to 
code right after a threshold becomes 16; that is, the coding is 
terminated when the dominant and subordinate passes are all 
coded with respect to the threshold 32. The same condition is 
applied to the Barbara image and the results are given in table 
IV-l (b). As we can see in the table, the numbers of POS and 
NEG symbols are the same but the compressed sizes are 
different. In the flexible relation, some IZ symbols are 
disappeared instead of some increase of ZTR symbols. As we 
can see in the table IV-l, we get different symbol streams from 
those two coders respectively. Comparing with the number of 
produced symbols in the EZW, our coder produce 1575 less IZ 
symbols and 4704 more ZTR symbols for the Barbara image. 
Therefore, the ratio can be calculated by dividing the increase in 
ZTR by the decrease in IZ; 4704 / 1575 = 2.99. The value 2.99 
means that one IZ symbol was replaced with 2.99 ZTR symbols. 
The decreased IZ symbols play a part in lowering an entropy and 
therefore the image can be compressed with a smaller size. We 
can calculate each entropy for symbol distribution; 1.274 
bits/sym. and 1.195 bits/sym. for the EZW and the proposed 
coder respectively. By using the low entropy, we can compress 
more compactly, though total number of symbols is increased. 
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(a) Lenna (b) Barbara 

Figure IV-l . performance curves for the test images. 



(b) Barbara (5 12 X 5 12, 8 bits grey image, original size = 262 1 44 Bytes) 

V. CONCLUSIONS 

We described a new relation that a parent takes its children 
with a flexible and selectable method. We extend the fixed 
relation used in the EZW scheme in order to decrease entropy per 
symbol. The ways to lower the entropy are accomplished by using 
more symbols that are frequent and less symbols that are 
infrequent. The infrequent symbol is IZ in the EZW and we can 
avoid the use of the symbol by extending the relation in parent- 
child; a parent has nine children and some of them are shared with 
neighboring parents. With the extended relation, the number of IZ 
symbol is decreased. 

We showed that a symbol stream is coded with less entropy 
using the flexible relation in parent-child. Experimentally, our 
flexible coder has 0.2 - 0.7 dB better performances than the 
EZW's. We suppose that the flexible coder can be improved by a 
more efficient relation in parent-child. 
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Abstract: 

In many modern image compression techniques, the coefficients 
of an image transform are encoded by successive refinements. 
This is in general equivalent to successively encoding each 
bit-plane of the image transforms. Such methods, especially when 
the wavelet transform is employed, are among the 
state-of-the-art in image coding. In fact, several proposals to the 
JPEG 2000 standard use some sort of bit-plane encoding. 
Bit-plane encoding has been traditionally used as a scalar 
quantization technique, that is, each coefficient in individually 
decomposed in bit-planes. We analyze extensions of the bit-plane 
encoding concept to vectors, whereby each vector of the 
coefficients is decomposed in “vector bit-planes”. 
First, we formally describe the vector bit-plane representations, 
and then state a theorem concerning conditions for their 
existence. Next, we propose a modification to the proposed vector 
bit-plane representations and state a second theorem, which 
shows that the proposed modifications lead to much more robust 
algorithms. Simulation results are presented for the embedded 
encoding of wavelet coefficients of images, which confirm, the 
potential advantages of vector over scalar bit-plane 
representations. These strongly indicate that vector bit-plane 
representations should be further investigated 
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ABSTRACT 

In many modern image compression techniques, the co- 
efficients of an image transform are encoded by suc- 
cessive refinements. This is in general equivalent to 
successively encoding each bit-plane of the image trans- 
forms. Such methods, especially when the wavelet trans- 
form is employed, are among the state-of-the-art in im- 
age coding. In fact, several proposals to the JPEG 2000 
standard use some sort of bit-plane encoding. Bit-plane 
encoding has been traditionally used as a scalar quanti- 
zation technique, that is, each coefficient in individually 
decomposed in bit-planes. In this paper we analyze ex- 
tensions of the bit-plane encoding concept to vectors, 
whereby each vector of coefficients is decomposed in 
"vector bit-planes" '. First, we formally describe the vec- 
tor bit-plane representations, and then state a theorem 
concerning conditions for their existence. Next, we pro- 
pose a modification to the proposed vector bit-plane rep- 
resentations and state a second theorem, which shows 
that the proposed modifications lead to much more ro- 
bust algorithms. Simulation results are presented for 
the embedded encoding of wavelet coefficients of im- 
ages, which confirm the potential advantages of vector 
over scalar bit-plane representations. These strongly 
indicate that vector bit-plane representations should be 
further investigated. 

1. INTRODUCTION 

Wavelet transforms have been widely investigated for 
image coding applications. Among the wavelet image 
coding methods, the ones that are based on bit-plane 
encoding have become very popular. In these methods, 
the wavelet coefficients are coded in successive passes. 
In each pass, one bit- plane of the wavelet coefficients 
is encoded. In general, the similarities of the bands 
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of same orientation is taken into consideration in the 
encoding of the bit-planes, by the use of zero-trees or 
similar structures. Good examples of bit-plane wavelet 
coders can be found in [1, 2, 3). Although not restricted 
to wavelet coders, [4] is also a nice example of this re- 
cent trend to the use of bit-plane encoding. The perfor- 
mances of wavelet coders based on bit-plane encoding 
places them among the state-of-the-art in image cod- 
ing. 

Besides the good performance that can be obtained 
with bit-plane encoding of wavelet coefficients, bit-plane 
encoders have the advantage of naturally generating 
an embedded bitstream. In other words, an embedded 
bitstream with, for example, 1 bit/pixel, contains all 
the bitstreams with less than 1 bit/pixel. If we par- 
tially decode it up to 0.5 bit/pixel, we would obtain 
the same image that would be obtained if a bitstream 
of 0.5 bit/pixel was generated and decoded in the first 
place. 

Bit-plane encoding is a form of scalar quantization, 
because each coefficient is individually decomposed into 
bit-planes. However, coding efficiency mandates that 
each bit-plane be encoded considering sets of coeffi- 
cients. Zero-trees [1], run-length coding [3] and arith- 
metic encoding using conditional probabilities [5] are 
examples of that. Therefore, it is natural to won- 
der if there are non-trivial and efficient generalizations 
of bit-plane encoding to vectors. In other words, are 
there efficient ways of combining the advantages of bit- 
plane encoding and vector quantization? The answer 
to this question is affirmative, and in fact [6] describes 
a coder consisting of a straightforward substitution of 
the "scalar" bit-plane encoder in [1] by a kind of "vec- 
tor" bit-plane encoder. The coder in [6] has shown an 
encouraging improvement in performance over the one 
in [1]. 
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In this paper, first the general problem of bit-plane 
encoding of vectors is analyzed. Then, two novel theo- 
rems which establish sufficient conditions for the vector 
bit-plane encoding be possible are stated, and some of 
its properties are analyzed. Based on these theorems, 
some improvements to the coder in [6], are proposed. 
Last, simulation results are presented, and extensions 
to this work are proposed. 

2. VECTOR BIT-PLANES 

To decompose a scalar quantity -1 < c < 1 using bit- 
planes is equivalent to represent it through a sequence 
{&i,i>2,..- ib ni . such that: 



(1) 



where $ £ {-1,1} represents the sign of c and bi 6 
{0,1}. 

Alternatively, we can have s always equal to 1 and 
hi € {1, -1}, yielding the representation below: 



(2) 



For example, in [2], 6» € {0, 1}, and a representation 
like the one in eq. 1 is used. In [1], 6* e {1, -1}, and 
coefficients are represented as in eq. 2. 

In coding applications, the summation in eq. 2 is ob- 
viously not infinite, but goes from 1 to P ) the number of 
bit planes, yielding the approximation cp. In general, 
the more bit- planes are added to the summation, the 
smaller is the distortion |c - cp\ in the representation 
of c. P is often chosen as the smallest value such that 
a certain distortion criterion is met, i.e. jc — cp\ < A. 

A trivial way to do vector bit-plane encoding of an 
TV-dimensional vector v is to simply represent each of 
its coordinates v*, for k ~ 1, . ,7V using bit-planes. 
Therefore, from eq. 2, we have that the vector v can 
be represented as: 



V2 



Eq. 3 is equivalent to: 

f>2i 



oo 
i=0 



2~* =5Z b i 2_i 
i=0 



(3) 



We can see that eq. 4 is a vector version of eq. 2, 
that is, every vector v can be represented by a sequence 
of vectors {b^ba,... ,b„,...}. If b ki G {1,-1}, b< 
belongs to the codebook TV whose vectors are of the 
form {((-1)^' (-l) ft ...(-l) Av )*}. Since all vectors 
bi have magnitude equal to they are located on an 
hyper-sphere, thus representing different orientations. 
For this reason, Tn can also be referred to as an ori- 
entation codebook. 

Eq. 4 means that every vector v whose components 
are smaller than 1 can be represented as series of vec- 
tors of decreasing magnitudes (2~*) and orientations 
drawn from a fixed orientation codebook T^. As in 
the scalar case, the summations are not infinite, and 
go from 1 to the number of bit-planes P, yielding an 
approximation vp of v as below: 



v P =£bi2- 



(5) 



In coding applications, what is wanted is the small- 
est possible P (or, more exactly, the smallest possible 
entropy of the ensemble of vectors hi) such that the 
distortion ||v - vp|| < A. 

In this trivial extension of bit-plane encoding to vec- 
tors, the orientation codebook is composed by vectors 
whose components are 1 and -1 (eq. 4). At this point, 
a natural question to ask is whether there are other 
orientation codebooks such that vector bit-plane en- 
coding is more efficient. More precisely, we are looking 
for representations of a vector v having the form: 



(6) 



where u„,. € Cat, an orientation codebook composed 
by unitary vectors on an hyper-sphere. We should also 
observe that the terms 2~* in eq. 4 have been replaced 
by terms a\ 0 < a < 1. 

We want vq to arbitrarily approximate v for Q suf- 
ficiently large, that is, 



lim vq — v 



(7) 



Supposing that P and Q are the minimum number 
of terms in eqs. 5 and 6, respectively, such that the re- 
spective approximation errors are smaller than A. We 
have that the representation in eq. 6 is more efficient 
than the one in eq. 4 if the entropy of the ensemble of 
vectors bi, i = 1, . . . , P is larger than the one of the 
ensemble of vectors u ni , i = 1, . . . , Q y and vice versa. 

In order to investigate more efficient vector bit-plane 
representations, one must first determine under what 
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conditions does a general vector bit-plane representa- 
tion as in eq. 6 exists such that eq. 7 holds. This will 
be dealt with in the next section. 

3. EXISTENCE OF VECTOR BIT-PLANE 
REPRESENTATIONS 

In order to determine conditions for the decomposi- 
tions in eq. 6 to satisfy eq. 7 we have to first define 
one important parameter from an iV-dimensional ori- 
entation codebook Cjy, which we refer to as 0(Cat) 1 . 
It is the maximum possible angle between any vector 
€ R N and its nearest neighbor € C^. More precisely, 

e(C w ) = cos-Hm» te{lj ^ }}} (8) 

We have then the following theorem: 

Theorem 1 Given an orientation codebook Cs = 
{ui,U2,... ,UAf} such that ||u;|| = 1, Vz, then there 
exists a representation as in eg. 6 such that eq. 7 is 
valid for all veR^, ||v|| < 1 if: 

sin[0(C*)] < a < 1, B(C N ) > 45° (10) 

It is important to point out that this theorem only 
establishes sufficient conditions for eq. 7 to be valid for 
a representation as in eq. 6. In fact, its proof supposes 
worst case conditions. By worst case conditions it is 
meant that the angle between the residual = v - v t - 
and u„ i+1 is supposed always to be equal to 0(Ctv), 
which is clearly very pessimistic. 

An important property that can be deduced from 
eqs. 9 and 10 is that the smaller the value of 0(Cn), 
the smaller a can be. We can estimate the impact of 
the values of a in eq. 6 by noting that, if eq. 7 is valid, 
the magnitude of the error of a Q-term approximation, 
eg = v - vq is given by 

l|egll = ll f) iw*'||<| f>i = ^ (11) 

Eq. 11 shows us that the smaller the value of a, the 
smaller will be the bound on the residual approxima- 
tion error. Therefore, from a coding point of view, it 
is interesting to have the smallest possible value of a 
in order to minimize the approximation error. Then, 
from eqs. 9 and 10 we have that the orientation code- 
book Civ should be such that value of 3{Cn) is as 

1 ln [6] it has been referred to as 0 m ax- 



small as possible. For any given dimension, there are 
two ways of reducing the values of 0(CW): (i) by in- 
creasing the number M of vectors; (ii) by distributing 
the vectors "more uniformly" over the TV-dimensional 
unity sphere. However, when the number M of vectors 
in Cn is increased, there is a compromise: despite the 
decrease in the value of 0(Cjv), and, consequently, of a 
and the truncation error using Q vectors, there will be 
an increase in the entropy of the set u ni , i = 1, . . . , Q. 
Therefore, for the entropy to be maintained Q would 
have to be reduced, thereby increasing the distortion. 
Thus, the best way to have a codebook with a low value 
of B{Cs) is to have its vectors the more "uniformly 
distributed" possible over the unity sphere in U N for 
a given number M of vectors. Good examples of code- 
books satisfying this property are the first shells of the 
lattices which solve the sphere packing problem in N 
dimensions [7]. For example, analyzing eq. 4, which 
describes the case of a mere concatenation of the bit- 
planes of the vector components, it can b e pro ven that 
the codebook T N has &{T N ) = cos~ l (y/l/N). Table 
1 compares the number of vectors and the values of 
C{Tn) with the ones from the first shells of the lat- 
tices £>4, E$ and Aie, which solve the sphere packing 
problem in dimensions 4, 8 and 16. 



Codebook 


T 4 


Da 




E s 


r 1B 




M 


16 


24 


256 


240 


65536 


4320 


0 


60° 


45 6 


69° 


45° 


76° 


55° 



Table 1: Values of 0 and number M of vectors for 
several orientation codebooks. 



From this table we can see clearly the superiority 
of the codebooks derived from the lattices that solve 
the sphere packing problem. For example, Ts, despite 
having more vectors than has a much larger value of 
0. This implies that its vectors are much less uniformly 
distributed than the ones of E$, and therefore vector 
bit-plane representations based on it are less efficient. 

An algorithm for computing vector bit-plane 
representations 

Theorem 1 described conditions for the existence of 
vector bit-plane representations as in eq. 6 satisfying 
eq. 7. An important point when it comes to practi- 
cal applications is whether there is a fast algorithm for 
computing such representations. Fortunately, the an- 
swer is affirmative, and is given by the following greedy 
algorithm 2 : 

^ It should be noted that is is being assumed, without loss of 
generality, that ||v|| < 1. 
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1. Make m = 0, eo = v and (5 — ol. 

2. Given the vector e m , choose i m +i 6 {1, . . . , M}, 
where M is the size of the orientation codebook 
Cn, such that: 

e m • u im+1 = max{e m • u k : 1 < k < M} 

3. Compute e m +i = e m - /3u im+l 

4. Increment m, multiply by a and go to step 2 

One should note that this algorithm determines, in 
each pass, the vector Ui m+J which is closest to the resid- 
ual e m , and therefore minimizes the error in the rep- 
resentation of e m in that pass. However, this proce- 
dure is not guaranteed to generate the optimum repre- 
sentation, that is, the one which yields the minimum 
representation error after Q passes. More precisely, if 
the algorithm above generates a sequence of vectors 
WtuUia,.. . ,UjQ, which, according to eq. 6, provides 
an approximation v^, there is no guarantee that there 
is not a different sequence of vectors u ;i , Uj 2 , . . . , ujq 
providing an approximation V2Q according to eq. 6 
such that ||v - v 2 q|| < ||v - viq||. In other words, this 
discussion implies that, besides the fact that a repre- 
sentation as in eq. 6 satisfying eq. 7 is not unique, 
we have that the above algorithm will not necessarily 
find the optimum one. Fortunately, in most cases, the 
approximation it finds performs well enough. 

At this point it is instructive to point out that the 
above algorithm has some similarities to Mallat's match- 
ing pursuit algorithm [8]. The main difference is that 
in Mallat's matching pursuits we replace the a* term 
in eq. 6 by the projection of e^_i = v — v;_i on u ni . 
More precisely, after Q passes, a matching pursuit de- 
composition of a vector v is of the form: 

Q 

VQ = ^7iU n , (12) 
i=l 

where, likewise eq. 6, u nj € Cat, an orientation code- 
book composed by unitary vectors on an hyper-sphere. 
On the other hand, unlike eq. 6, 

7i = (v-v i _ 1 )-u n , (13) 

This implies that, while in the vector bit-planes rep- 
resentation, a vector is represented by just a sequence 
of unity vectors ,Uj 3 , . . . , in Mallat's matching pur- 
suits a vector is represented by a sequence of unity vec- 
tors Uj t , Uj 2 , . . . plus a sequence of projections 7^ , 7j 2 , . . 



Performance of vector bit-plane encoding in the 
context of embedded wavelet coding 

The vector bit-plane decomposition described above 
has been used in place of the conventional bit-plane de- 
composition in an EZW-like [1] algorithm. Details can 
be found in [6]. The results there have shown a per- 
formance improvement of around 1 dB for the LENA 
512x512 image in the vicinity of 0.5 bit/pixel when the 
lattice A 16 is used. As expected from the above discus- 
sions, the performance of the algorithm varies a great 
deal with a. For example, with Aie, the PSNR per- 
formance reaches a reasonably pronounced peak for a 
around 0.62 (see the "conventional algorithm" results 
in figure 1. This is in accord with what has been dis- 
cussed above in relation with Theorem 1, because if a 
is too small eq. 7 is not valid, that is, the magnitude of 
the error eg does not tend to zero as Q 00. If a gets 
too large, despite eq. 7 being valid, ||eQ|| increases (see 
eq. 11). The main problem with this sort of behaviour 
is that the a value which gives peak performance is 
image dependent. That is inconvenient in a number of 
applications, especially when encoding time should be 
kept as low as possible. 

In the next section we propose another theorem, to- 
gether with a modification to the vector bit-plane de- 
composition which solves this problem, and provides 
almost peak performance for a large range of a values. 

4. A MODIFIED VECTOR BIT-PLANE 
ENCODER 

It can be shown that, in order for eq. 7 to hold for any 
0 < a < 1, the algorithm above has to be modified in 
order to guarantee that the residual in pass m is such 
that 2 a m+1 < ||e m || < a m . This is done as follows: 
First, the zero vector is added to the codebook Cp?. If 
the magnitude of e m is smaller than a m+1 , u im+1 is 
chosen to be the zero vector, so that there is no refine- 
ment for that pass. If the magnitude of e m is greater 
than a m , /? (see step 3) will not be multiplied by a in 
that pass. In a practical algorithm this can be signal- 
ized to the decoder by the inclusion of an escape code 
before the vector for that pass. With this modification, 
eq. 6 becomes: 

OO P(v,i) 

v = ( 14 ) 
i=i j=i 

Then, a new theorem can be stated, stronger then 
Theorem 1: 

Theorem 2 Suppose that the orientation codebook used 
in vector bit-plane encoding has Q(Cn) < 60°. Then 
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a decomposition such as the one in eq. H exists for 
every 0 < a < 1 . 

It is important to notice that, in order for the vec- 
tor bit-plane decomposition in eq. 14 to be practi- 
cal, p(v,i) should be small with great probability. For 
this reason, we used only 0.5 < a < 1 in our experi- 
ments. Indeed, we observed in these experiments that 
p(v, t) = 1 occurred with a probability near 1 for these 
values of a. 

Another point is that, likewise the algorithm de- 
rived from Theorem 1, this new algorithm does not 
necessarily lead to an optimal representation, but its 
performance is sufficiently good. 

Experimental results 

We have implemented the modifications leading to The- 
orem 2 to the vector bit-plane coder in [6]. The perfor- 
mance of this improved algorithm and the algorithm 
in [6] are compared in figure 1 for 0.5 < a < 1 using 
the first shell of the lattice A X e as the orientation code- 
book, for three different images. It can be observed 
that in the improved algorithm the performance does 
not degrade when a decreases, and therefore the choice 
of a is much less critical than in the previous case. In 
practice, this means that the value of a and can be 
made image independent, and therefore the modified 
algorithm is much more robust. 

ue-qebpp 
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Figure 1: a x PSNR for the images ZELDA, BOATS 
and LENA 256x256 at 0.5 bit/pixel with the "con- 
ventional" [6] and "improved" * (theorem 2) vector bit- 
plane coder, using the first shell of Aie as orientation 
codebook. 



5. CONCLUSIONS 

In this paper the concept of bit-plane encoding has 
been extended to vectors. It has been shown that there 
are orientation codebooks which can provide better 
performance than the trivial codebooks formed by sep- 
arately bit-plane encoding each component of the vec- 
tors. Two theorems concerning the existence of "good" 
vector bit-plane decompositions have been stated. It 
has also been shown that when the first shells of the 
regular lattices which solve the sphere packing prob- 
lem are used as orientation codebooks, the variations 
of vector bit-plane encoding proposed in this paper 
can outperform the conventional bit-plane encoding in 
EZW-like embedded wavelet encoders. However, the 
optimum choice of orientation codebooks still needs 
further investigation. Considering the great impor- 
tance of bit-plane encoding in many modern image cod- 
ing schemes, the results obtained are encouraging. 
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ABSTRACT 

In this paper, we introduce a mixed fixed-length coding 
(FLC) and variable-length coding (VLC) technique to 
improve the error resilience of the upcoming JPEG2000 
image coding standard. Our proposed method oper- 
ates on quantized subband coefficients, and produces 
a bit-stream with both FLC and VLC codewords. 
Higher error resilience is achieved by eliminating the 
error propagation within FLC sections, which in gen- 
eral comprise the majority of the coded bit-stream. A 
highly efficient coding technique has been developed to 
generate fixed-length codewords for groups of quanti- 
zation indices, and significantly reduce the amount of 
information to be entropy coded. However, the noise- 
less compression performance is not compromised. Ev- 
ery effect has been made to minimize the impact of 
our modifications to the baseline coding structure and 
the bit-stream syntax defined in the standard, so the 
proposed technique can be easily integrated into any 
JPEG2000 compliant system. 

1. INTRODUCTION 

The JPEG2000 coding algorithm consists of a 
wavelet/subband transformation, a dead-zone scalar 
quantization, and the "embedded block coding with 
optimized truncation" (EBCOT) [1] bit-plane entropy 
coding technique with a bit-stream truncation mech- 
anism performing optimal rate control. The data 
streams flowing through these modules are lines of 
pixels for memory efficiency, which is refereed as the 
line-based processing. However in both transformation 
stage and entropy coding stage, temporary data pools 
are formed in order to perform the block-based opera- 
tions. 

After the subband decomposition and the scalar 
quantization, the subband coefficients are partitioned 
into blocks with default size of (64 x 64), and the cod- 

*THIS WORK WAS SUPPORTED IN PART BY U.S. 
ARMY RESEARCH OFFICE UNDER GRANT # DAAH04- 
96-1-0161. 



ing of each block is independent to each other. The 
EBCOT algorithm employs a multi-pass bitplane en- 
tropy coding procedure. It operates on each individual 
blocks, and accesses each bitplane through three passes, 
namely the "Significance propagation pass (Pi) 7> > the 
"Magnitude refinement pass (P2)" and the "Normal- 
ization pass (^3)". As a bitplane coding algorithm, 
a magnitude threshold is calculated for each bitplane, 
and it is scaled down by a factor of 2 at each succes- 
sive bitplane, A pixel is identified as significant pixel at 
a certain bitplane if its magnitude is above the corre- 
sponding threshold, and a significant pixel will remain 
significant through all the rest of the bitplanes. At each 
bitplane, the P\ passe only tests those insignificant co- 
efficients which have at least one significant neighbor, 
and the P3 pass tests all the rest of the insignificant 
coefficients. If a coefficient is found to become signifi- 
cant during either of these two passes, its sign bit will 
be encoded and its location can be inferred from all the 
test decision bits. The significant coefficients which are 
found in previous bitplanes are refined during the P2 
pass. All the symbols generated in these three passes 
will be encoded by an adaptive arithmetic coding en- 
gine with corresponding context modeling technique. 

During preparation of the EBCOT encoding opera- 
tion, every four rows of the quantized coefficients are 
interleaved into one row, and four consecutive insignifi- 
cant coefficients will be encoded together to reduce the 
number of output symbols. 

At the decoder, the EBCOT performs an inverse pro- 
cess in which coefficient values will be decoded and 
placed at their correct locations. These coefficients 
will then be de-interleaved to form the original subband 
blocks. This is followed by the inverse wavelet/subband 
transformation to obtain a reconstructed image. 

2. PROPOSED CODING TECHNIQUE 

Our work is based on the verification model VM5.2, and 
it is a special implementation of a generalized Adap- 
tive Quantization scheme introduced in [2]. The pro- 
posed modifications preserve the line-based processing 
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pipeline^ and the block-based EBCOT procedure. All 
the changes are strictly within the bitplane entropy 
coding stage. 

During the coefficient interleaving process, we intro- 
duced a new operation, which perform a quantization 
index conversion after every four coefficients are read 
from a subband block. We do not impose any restric- 
tion on the order in which the block coefficients are 
scanned. The index conversion basically operates on 
every adjacent four coefficient group. Based on the 
current VMS. 2 implementation, each of these four co- 
efficient groups is essentially a vertical (4x1) vector 
in the original subband block. The quantization index 
conversion is based on the codebook of a specially de- 
signed 4-D multi-stage lattice vector quantizer (LVQ) 
[3]. 

An LVQ codebook contains a highly structured lat- 
tice codebook with points effectively spanning a par- 
ticular signal space. It does not require any training 
in the design phase, and it can be implemented effi- 
ciently without codeword storage. By using a multi- 
stage quantization structure with relatively small code- 
book sizes, our LVQ does not yield any significant in- 
crease in computations complexity comparing to simple 
scalar quantizers. 

Two LVQ codebooks have been designed for differ- 
ent quantization stages. Both are derived from the root 
lattice Z 4 , which is the union of all integer points in the 
4-dimensional space [4]. A 6- bit-per- vector sphere trun- 
cated LVQ is used for the first quantization stage which 
is able to achieve sufficient shape gain on generalized 
Gaussian source. The codebook is obtained by truncat- 
ing the root lattice at the radius of 3, which produces an 
LVQ with 3 energy shells and 64 symbols. The energy 
measure for sphere truncation is the norm Li. In order 
to include the origin in the codebook, one point is re- 
moved from the third shell. Thus, the total number of 
codewords is 64; requiring 6-bit indices for binary rep- 
resentation. A 4-bit-per-vector cubic LVQ is applied to 
all the successive refinement stages, which guarantees 
the convergence of all the successive approximations. 
Before truncation, the lattice is shifted by the vector 
(£> h \ » i)» so tnat a ^ *he corresponding quantizers be- 
come mid-rise SQs. The truncation radius is set to 1 in 
the infinity norm so that 16 codewords are obtained for 
each vector. As we can see that, by applying our LVQ 
to more than one stage, the resulting quantization cells 
are identical to those from the dead-zone scalar quan- 
tization, which confirms the compatibility of our LVQ 
in the baseline coding structure. The scaling factor 
between each consecutive quantization stages is \ in 
our multistage LVQ implementation, which essentially 
consists with the bit-plane coding in EBCOT. 



In our modified EBCOT coding passes .Pi and P3, 
only the first coefficient of each four coefficient group 
is tested and encoded. If this first coefficient is sig- 
nificant, all four coefficients are deemed as significant. 
Otherwise, all four coefficients remain as insignificant. 
The adaptive arithmetic coding of this significance test 
is exactly the same as the EBCOT approach. If the 
group is significant, we set the same context flags for 
all the four coefficients. The difference between our 
modified Pi, P3 passes and the EBCOT passes is that 
we do not need to send sign bit of the new significant co- 
efficients. This is because the sign information of each 
coefficient group is included in their first stage LVQ in- 
dex. After the new significant coefficients are found at 
each bitplane, a first stage LVQ codeword is generated 
for each four coefficient group in the P 2 pass. Similar 
to the EBCOT, our Pi pass also contains the refine- 
ment stage LVQ indices for all the previously found 
significant coefficient groups. It is important to point 
out that we do not use any arithmetic coding in the P2 
pass. This arrangement can be easily achieved because 
the baseline JPEG2000 defines a operation mood that 
uses only binary (raw) coding for the P2 pass, there- 
fore we do not have to introduce any new coding en- 
gine into the JPEG2000 coding structure. The final 
bit stream is organized in the way that the Pi pass 
and P3 passe represent all the arithmetic coded (VLC) 
sections, and the P2 pass contains all the binary coded 
(FLC) section. Our modifications are transparent to 
the bit stream formation processes, and they do not 
effect any bit stream manipulation and other existing 
error resilience mechanism. 

At the decoder, the Pj and P 3 passes perform the 
arithmetic decoding of all the significance information 
for each four coefficient group at each bitplane. This 
information essentially identifies the locations of all the 
significant coefficients whose quantization indices have 
been included in the P 2 pass. It is apparent that once 
the significance of a coefficient group is found, the sym- 
bol length of its quantization index in the Pi pass is 
known. Because of the binary coding used in the P2 
pass, there will be no confusion on which symbol be- 
longs to which coefficient group. After all the decod- 
ing passes are completed, an inverse quantization in- 
dex conversion is performed naturally during the corre- 
sponding de- interleaving process, and the original sub- 
band block is reconstructed. 

3. ERROR RESILIENCE FEATURE 

Effects of channel noise to source coded bit streams can 
be characterized into two categories, propagationat and 
non-propagational. If a bit error can cause a loss of syn- 
chronization between the encoder and the decoder so 
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that the bit stream following this error bit is entirely 
non-decodable, it is called propagational error. This 
usually happens when the source bit stream is coded 
through a variable length coding (VLC) scheme. On 
the other hand, a bit error in some bit stream may 
only cause a single error symbol at the decoder. This 
is called non-prop agational error. Non-propagational 
error usually happens when fixed length coding (FLC) 
schemes are used. The problem with VLC in noisy 
channels is that it introduces inter-symbol dependen- 
cies within the coded bit stream. Correct decoding of 
each source symbol depends on the correctness of all 
the preceding decoded symbols. On the other hand, 
FLC schemes do not have this kind of problem. This is 
because the length of each source symbol is fixed and 
known to both encoder and decoder. Therefore every 
symbol can be independently decoded at the decoder. 
Any bit error in a FLC coded bit steam will be con- 
fined within a single symbol and will not effect the de- 
coding of any other symbols. In general, the exclusive 
choice of either VLC or FLC represents a tradeoff be- 
tween compression efficiency and error resilience. Our 
proposed scheme intends to explore a new approach in 
which a better balance between these two aspects can 
be achieved. 

Based on our descriptions in the previous section, we 
can see that our modified Pi and P3 passes are VLC 
coded and the P2 pass is FLC coded. In an error re- 
silient mode, each pass is to be packetized, and there 
will be no error propagation cross the boundary of any 
two adjacent passes. Under this situation, the channel 
noise effects within our coded data stream will be quite 
different from those in the J PEG 2000 bit stream. In the 
EBCOT coding passes, any bit error is in fact a prop- 
agational error because the extensive use of arithmetic 
coding. Therefore one bit error may cause the loss of 
a whole packet and all the rest of the packets in the 
same subband block. However in our bit stream, only 
bit errors in the Pi and P3 passes may cause similar 
damages. Bit errors in the P 2 pass are not propaga- 
tional because of its FLC scheme, therefore they can 
only cause localized distortion, and no packet will be 
lost. 

The advantage of this approach becomes apparent 
when our specially LVQ coding techniques can keep 
the VLC coded sections very small without any signifi- 
cant loss of compression performance. In our modified 
Pi and P3 passes, only one test symbol is generated for 
every four coefficients, and no sign bit is encoded. It 
is clear that the information processed by these passes 
is about 4+ times less than the amount encoded in the 
EBCOT approach. Therefore the resulting bit streams 
from Pi and P 3 passes are much smaller than those 





J2K VM5.2 


Our Mod. VM5.2 


Image 


PSNR(dB) 


FLC % 


PSNR(dB) 


KLC % 


WOMAN 


27.27 


0 


26.81 


76.39 


CAFE 


20.70 


0 


19.91 


74.67 


GOLD HILL 


28.10 


0 


27.68 


71.99 



Table 1. PSNR performance of tested image 
coders at bit rate of 0.125 bpp in noiseless envi- 
ronment. 





J2K VM5.2 


Our Mod. VM5.2 


Image 


PSNR(dB) 


FLC % 


PSNR(dB) 


FLC% 


WOMAN 


33.50 


0 


32.72 


76.96 


CAFE 


26.69 


0 


25.67 


73.91 


GOLD HILL 


32.78 


0 


31.87 


75.53 



Table 2. PSNR performance of tested image 
coders at bit rate of 0.5 bpp in noiseless envi- 
ronment. 

produced by the EBCOT passes. During our extensive 
tests, we have found that for most images at commonly 
used bit rates, our coding scheme based on the four co- 
efficient group with 4-D LVQ indexing usually produce 
about 20% ~ 30% VLC coded sections and 70% ~ 80% 
FLC coded sections, as shown in Tables 1-2. Com- 
paring to the EBCOT approach where 100% of the bit 
stream is VLC coded, our data streams clearly have less 
chance of manifesting propagational errors. It is impor- 
tant to note that this error resilience feature does not 
rely on any error detection or correction scheme, and 
also will not reduce the effectiveness of any such scheme 
if they are employed. 

4. EXPERIMENTAL RESULTS 

We have tested the JPEG2000 VM5.2 coder and our 
modified VM5.2 coder in both noiseless and noisy chan- 
nel situations. In the noisy channel simulations, a Bi- 
nary Symmetric Channel (BSC) with random noise at 
BERs of 10~ 4 and 10" 3 is used as the channel model. 
These BERs resemble some typical residual noise condi- 
tions in common error controlled communication net- 
works. In both coder, the packet resynchronizations 
marker defined in the baseline is used as an error de- 
. tection measure, and a simple error cancelation mecha- 
nism is used at the decoder which tends to discard any 
VLC packet with detected channel error. The images 
"WOMAN" (2048 x 2560), "CAFE" (2048 x 2560) and 
"GOLD HILL" (512 x 512) are coded at 0.125 bpp and 
0.50 bpp. All noisy channel simulations are repeated 
100 times for each test condition. Tables 1 -6 give the 
results of our simulations. 

As we can see from these results, although our mod- 
ified image coder produces a majority of FLC bit 
stream, it still performs reasonably well in noiseless 
channel comparing to the original VMS. 2 coder. On 
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PSNR (dB) 


Mean 


Max 


Min 


Std 


J2K VM5:2 


WOMAN 


21.83 


26.19 


13.05 


3.08 


CAFE 


15.59 


19.33 


12.13 


""lis 


GOLD HILL 


26.67 


28.10 


17.89 


1.39 


Mod. VM5.2 


WOMAN 


23.39 


26.55 


13.20 


2.74 


CAFE 


17.66 


19.74 


12.69 


1.77 


GOLD HILL 


27.16 


27.74 


22.32 


0.94 



Table 3. PSNR performance of tested image 
coders at bit rate of 0-125 bpp in random noise 
BSC with BER - 10" 4 . 





PSNR (dB) 


Mean 


Max 


Min 


Std 


J2K VM5.2 


WOMAN 


21.36 


26.39 


13.07 


2.92 


CAFE 


15.44 


20.76 


11.77 


2.15 


GOLD HILL 


28.63 


31.32 


17.93 


2.01 


Mod. VM5.2 


WOMAN 


23.07 


30.75 


12.97 


2.93 


CAFE 


17.40 


23.61 


11.96 


2.02 


GOLD HILL 


29.60 


31.85 


20.87 


2.73 



Table 4. PSNR performance of tested image 
coders at bit rate of 0.5 bpp in random noise 
BSC with BER = lO" 4 . 

the other hand, we can clearly observe an performance 
improvement of up to 3 dB in various noisy channel. 

Because our mixed FLC/VLC coding module oper- 
ates on the same quantized subband coefficients as the 
VM coder does, and their rate control mechanisms are 
also the , same, the difference between their noiseless 
performance is exclusively due to the different cod- 
ing efficiencies between the FLC/VLC coding and the 
arithmetic coding. We notice that our LVQ coding 
method performs surprisingly well despite its fixed- 
length codeword and much simpler coding structure, 
comparing to sophisticate and high efficient arithmetic 
coding. 

In general, our coder performs better at low bit rates 
and at low -BERs. This is because at these situations, 
the VLC packets are very short, and therefore their 
surviving rates are high. On the other hand, when the 
bit rate and/or BER gets higher, even though our VLC 
packets are about 4 times shorter than their counter- 
parts from the VM coder, they are still long enough to 
suffer from one or more bit errors, which consequently 
will cause the cancelation of the whole packet. This can 
be described as a noise saturated state for these VLC 
packets. It is easy to see that at a certain point when 
channel noise is so severe that almost all the packets 
will eventually be cancelled, and then our coder will 
have the same performance as the VM coder. To avoid 
this situation, we can apply forward error correction 
(FEC) to the bit-stream. With our short VLC pack- 
ets and error resilient FLC packets, we can develop a 
more effective error protect scheme than what we can 
do with the JPEG2000 coder. 





PSNR (dB) 


Mean 


Max 


Min 


Std 


J2K VM5.2 


WOMAN 


14.06 


17.99 


12.14 


1.32 


CAKE 


10.96 


11.48 


10.49 


0.23 


GOLD HILL 


21.77 


25.11 


16.43 


2.19 


Mod. VM5.2 


WOMAN 


17.28 


22.19 


12.19 


3.26 


CAFE 


12.55 


15.18 


10.51 


1.27 


GOLD HILL 


23.27 


26.81 


17.71 


2.16 



Table 5. PSNR performance of tested image 
coders at bit rate of 0.125 bpp in random noise 
BSC with BER = lO" 3 . 





PSNR (dB) 


Mean 


Max 


Min 


Std 


J2K VM5.2 


WOMAN 


13.87 


17.95 


12.14 


1.41 


CAFE 


10.97 


11.48 


10.58 


0.26 


GOLD HILL 


21.07 


25.31 j 


14.33 


2.30 


Mod. VM5.2 


WOMAN 


17.40 


22.10 


12.18 


3.24 


CAFE 


12.70 


15.27 


10.51 


1.33 


GOLD HILL 


22.51 


26.85 


14.80 


2.33 



Table 6. PSNR performance of tested image 
coders at bit rate of 0.5 bpp in random noise 
BSC with BER = 10~ 3 . 

5. CONCLUSION 

We have introduced a mixed FLC/VLC coding tech- 
nique under the JPEG2000 image coding structure. It 
is able to achieve a comparable compression efficiency 
as the baseline coder has in noiseless environment, and 
yet provide better performance in noisy channels. Our 
modification is compatible to the basic coding structure 
and bit-stream syntax defined in the standard, and can 
be easily adopted into existing systems. Experimental 
results have clearly demonstrated the viability of our 
approach in both noiseless and noisy situations. 
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Summary form only given. Local distortion inference is proposed 
as an alternative to assuming a spatially uniform error for image 
compression applications requiring careful error analysis. This 
paper presents an algorithm for inferring estimates of the L 2 -norm 

distortion (root-mean-square or RMS error) for the embedded 
zerotree wavelet (EZW) algorithm while maintaining its good 
rate-distortion performance. A small amount of both rate and 
computational burden is required of the encoder to calculate and 
transmit sums of wavelet coefficient energies. A greater 
computational burden is added to the decoder, mainly by an 
“error-propagation transform” of equal complexity 
to the inverse hierarchical wavelet transform. The asymmetry of 
the compression system with distortion inference is ideal for 
space-based data gathering applications where computation 
capacity may be limited in the encoder but virtually unlimited in 
the decoder. Global distortion inference is accomplished by 
maintaining error energy estimates of the wavelet coefficients 
separately for the subordinate and dominant lists. An equal 
reduction of error energy per significant bit is found to accurately 
interpolate the operational rate-distortion curve, which is explicitly 
transmitted prior to each dominant pass. Because of the 
orthogonality of the wavelet transform, the rate-distortion curve 
also applies to reconstructed images in the spatial domain 
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Global and Local Distortion Inference During 
Embedded Zerotree Wavelet Decompression 



A. Kris Huber and Scott E. Budge 
Electrical and Computer Engineering Dept. 
Utah State University 
Logan, Utah 84322-4120 
e-mail: kris@ece.usu.edu & scott@goga.ece.usu.edu 

Abstract 

Local distortion inference is proposed as an alternative to assuming a spatially uniform 
error for image compression applications requiring careful error analysis. This paper 
presents an algorithm for inferring estimates of the Lj-norm distortion (root-mean-square 
or RMS error) for the Embedded Zerotree Wavelet (EZW) algorithm while maintaining 
its good rate-distortion performance. A small amount of both rate and computational 
burden is required of the encoder to calculate and transmit sums of wavelet coefficient 
energies. A greater computational burden is added to the decoder, mainly by an 
"error-propagation transform" of equal complexity to the inverse hierarchical wavelet 
transform. The asymmetry of the compression system with distortion inference is ideal 
for space-based data gathering applications where computation capacity may be limited 
in the encoder but virtually unlimited in the decoder. 

Global distortion inference is accomplished by maintaining error energy estimates of 
the wavelet coefficients separately for the subordinate and dominant lists. An equal 
reduction of error energy per significant bit is found to accurately interpolate the 
operational rate-distortion curve, which is explicitly transmitted prior to each dominant 
pass. Because of the orthogonality of the wavelet transform, the rate-distortion curve also 
applies to reconstructed images in the spatial domain. No additional rate overhead is 
needed to obtain a local distortion estimate. Individual estimates of wavelet coefficient 
error energies may be transformed to the spatial domain by applying the statistical 
propagation of errors formula for weighted sums of random variables. The resulting local 
distortion information is a "noise image" which gives an estimate of the RMS error for 
each pixel of the decompressed image. This local information is analogous to the error- 
bars often plotted on graphs of 1 -dimensional data. It can be used during an analysis to 
more appropriately weight the value of each pixel, rather than weighting them all equally. 

This local distortion estimate is most useful at low bit rates, when compression error 
dominates. Large errors, however, can occur that are significantly underestimated by the 
noise image. This is probably caused by correlation of the error with the input image 
and/or nearby errors. The error propagation transform, as implemented, assumes errors 
are uncorrected since no correlation information is available at the decoder. In spite of 
this defect, the local estimate is usually a better estimate than the global RMS error for 
low bit rates, even for the outliers. At higher bit rates the EZW compression error 
becomes very Gaussian and quite spatially uniform. In this case the local distortion 
estimate gives little or no improvement over the global estimate. 
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Abstract: 

This paper presents a new error concealment (EC) technique for 
DCT based image coding. In our approach, the damaged blocks 
are recovered utilizing the smoothness property of an image at 
the boundaries of the blocks. Based on the property, we first 
define an object function which represents the intersample 
variations between adjacent blocks. Then, the DCT coefficients 
which minimize the object function are estimated by finding a 
solution of a linear equation. And, we show that it can be 
decomposed and reduced into more simple sub-equations. Thus, 
the computational complexity of the proposed technique is very 
low, compared to the existing techniques. Computer simulation 
results show that the proposed algorithm recovers the damaged 
blocks even if the block loss rate (BLR) is as high as 10" 2 
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ON THE ERROR CONCEALMENT TECHNIQUE FOR DCT BASED IMAGE 

CODING 

Jong Wook Park Dong Sik Kim Sang Uk Lee 

Signal Processing Lab. Dept. of Cont. Ac Inst. Eng. 
Seoul National University, Seoul 151-742, KOREA 



ABSTRACT 

This paper presents a new error concealment (EC) tech- 
nique for DCT based image coding. In our approach, 
the damaged blocks are recovered utilising the smoothness 
property of an image at the boundaries of the blocks. Based 
on the property, we first define an object function which rep- 
resents the intersample variations between adjacent blocks. 
Then, the DCT coefficients which minimise the object func- 
tion axe estimated by finding a solution of a linear equation. 
And, we show that it can be decomposed and reduced into 
more simple sub-equations. Thus, the computational com- 
plexity of the proposed technique is very low, compared to 
the existing techniques. Computer simulation results show 
that the proposed algorithm recovers the damaged blocks, 
and there exists no longer annoying in visual perception, 
even if the block loss rate (BLR) is as high as 10~ 3 . 

1. INTRODUCTION 

Most of recent image coding standards are based on the 
discrete cosine transform (DCT) and variable length cod- 
ing (VLC) to compress an image data. In transmitting the 
compressed image data, even 1 bit error in the bit-stream 
may cause a considerable damage in the reconstructed im- 
age qualities. Thus, a scheme to alleviate the effect of in- 
evitably occurring errors are necessary for a reliable image 
communication. In general, there are two approaches: the 
forward error correction (FEC), and EC techniques. The 
EC attempts to fill the damaged areas with its estimated 
one, which is usually obtained from the spatially or tempo- 
rally neighboring data. Thus, the EC can recover the errors 
uncorrected by FEC, and does not require any additional 
information. 

Several EC techniques have been proposed [1-4], which 
are based on the temporal replacement (TR) or spatial in- 
terpolation (SI) techniques. The TR technique is known to 
be very effective in slow-motion area, while the SI technique 
is more preferred to the fast-moving objects. In SI tech- 
niques, however, it is of important to reconstruct the plau- 
sible DCT coefficients of damaged blocks. Sun [2] adopted 
the blocking effect reduction algorithm of JPEG [7]. Wang 
et al. introduced the optimisation technique based on the 
smoothness property of an image [1], and Lee et al. pro- 
posed to the uBe of a fussy logic reasoning [4]. 

In this paper, we present a new EC technique for DCT 
based image coding. In our approach, we first define an 



object function to describe the interBample variation at the 
boundary between adjacent blocks. Then, by estimating 
the DCT coefficients which minimise the object function 
in least square (LS) sense, the recovered block is smoothly 
connected to the neighboring blocks. The computer sim- 
ulation results indicate that the recovered blocks are not 
easily recognised, even the block loss rate (BLR) is as high 
as order of 10~ 3 , where the BLR is the ratio of average 
number of damaged blocks to the number of total received 
blocks [6]. Moreover, the computational complexity of the 
proposed algorithm is an order of O(N), where N is the 
block sise, making a real-time implementation possible. 

2. OBJECT FUNCTION FOR DCT 
COEFFICIENT RECOVERY 

The effects of bit errors in bit-stream are very complex 
and unpredictable. Thus, in order to simplify the prob- 
lem, we introduce^ several assumptions. First, we employ 
the well-known smoothness assumption on the luminance 
level within a BmaH area of image, as in [1]. The second as- 
sumption is that all the DCT coefficients in the erroneous 
block are completely lost. The third one is that the lo- 
cations of erroneous blocks can be isolated, provided that 
a proper block interleaving algorithm is used, such as [5]. 
Last, the locations of damaged blocks are known o priori, 
since it is possible to check whether the block is erroneous 
or not by examining the protocol, synchronising codeword, 
and etc. 

Next, let us introduce some notations to represent the 
blocks and vectors for the sake of convenience. Let the sise 
of a block be N x N t and an image is composed of Q x P 
array of blocks. By denoting the block at time r by A Tl then 
the upper, lower, left, and right block of A r are denoted by 
Ar-Pt A T +r, Ar_i, and .Ar+i, respectively. Also let the 
4 boundary vectors of A r in 4 different directions, namely, 
top, bottom, left, and right be U t K, It , and r Tl respectively. 
All the denned blocks and vectors are described in Fig. 1. 

Now, we define an object function for the estimation of 
the lost DCT coefficients. Based on the smoothness as- 
sumption, it is believed that the recovered block should be 
smoothly connected to the neighboring blocks at the bound- 
aries. Hence, an object function is defined as 

* = i|*r-fcr-p|| a +||*r-«r+p|I a 

+ 11^-^11'+ || r T -t T+1 || J . (1) 

Notice that the object function measures the degree of 
the smoothness. From the energy preserving property of 
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the DCT, the object function can be rewritten in the DCT 
domain as 

* = || 7V-B T -p || a + || B T -T T+P || 3 

+ H^-ilr-llP + llilr-Lr+ilP, (2) 

where T Tt B T ,L ri and R, are 1 -dimensional DCT pairs of 
the boundary vectors t r ,b rt l Tt and ry, respectively. On 
the other hand, after some algebraic manipulations and 
from the definition of DCT, we obtain the following rela- 
tions: 

N-l 

Jsr-i 

Br(j) = Yl '*/3 fc A r (*,», > = 0,...,iV-l, (3b) 

k=0 
N-l 

MO = J^AA T (i.O. • = 0,... ) iSr-l, (3c) 

<=0 
JV— I 

«r(0 = ^'lAAr(t,l), » = 0,...,JV-- 1, (3d) 
J=0 

where 

and at(0) = ^/T ( an j a (.) - ^ otherwise. Also A T 
denotes 2-dimensional DCT pairs of A T . 

By substituting (3) into (2) and applying the h norm, 
the object function becomes 

N-l JV-l JV-l 

* = ^I^AA^iJ.^^A^pC*,,-)]' 

JV-l N-l N-l 

+ •*AA T (fc I » - £ 0„A r+J >(fc,i)] 3 

N-L JV-l JV-l 

»=0 laO i=0 
JV-i JV-l JV-i 

t=0 1=0 1=0 

But, it is worth to note that (5) is a function of jV 3 variables 
in{A r (fc,0|fc,/ = 0 1 .-,iV-l}. 

3. ESTIMATION OF THE DCT COEFFICIENT 
In this section, we shall present an algorithm to calculate 
A T which minimises the object function $ . By examin- 
ing (5), we can see that * is a quadratic function of each 
A T (m, n), where m, n = 0, • • ■ , N - 1. Therefore, we can 
find the minimum value of * , where all the gradients van- 
ish. By partial differentiating both sides of (5) with respect 
to A r (m,ri), we can obtain the following jV 3 equations: 

a(A + 0;)A T (m,n) 



JV-l 

+ £ /9 m 0 fc (l + * m # fc )A r (*,n) 

JV-l 

+ £ /3n/3|(l + « w #t)Ar(m,0 

= 0(m,n) ( m,n= 0,--,jV- 1, (6) 

where 

JV-l 

#m,n) = Mhl*h^p{k t n) + *mA T+ p(Je,n)] 

JV-l 

+ 5^^lhA r _ l (m,/) + an A T+1 (m,0]. (7) 

1=0 

However, all the iV 3 linear equations in (6) can be repre- 
sented by a matrix- vector form, given by 

Hf = #, (8) 

where H is a matrix of N 2 x N 7 , each row of which repre- 
sents one of the linear equations in (6). And f is a vector 
composed of the coefficients of {A r }, and similarly, * is a 
vector composed of {^}. Hence, by solving the linear equa- 
tion (8), we can estimate the lost coefficients A T , which 
minimise the ¥ in the LS sense. 

In order to solve (8), the inversion of H is necessary. 
But, in most case the sue of H is so large that the inver- 
sion is difficult and unreliable. Therefore, it is necessary 
to obtain the solution indirectly by making H more simple 
form by following two steps. First, let us decompose the 
matrix H into 4 sub- matrices. Among the coefficients of 
{A r (k,m)\k = 0, ■ ■ • , N — 1} involved in the computation 
of A T (mi , ni ) for given constant integer mi , ni , only the 
coefficients k of which satisfies the following condition are 
selected. 

k mod 2 = m; mod 2. (9) 

Note that depending on the value of A, the term (l+* mi *fc) 
in (6) becomes sero if the condition (9) is not satisfied. 
Similarly, the condition for I is given by 

/ mod 2 = ni mod 2, (10) 

which can be derived from the term (1 + *n,*i) in (6). It 
can be shown that these two conditions make it possible to 
group the elements of { A T } into 4 classes, namely, 

Si = {A T (m, n)| m = even, n = even}, 

St = {i4 r (m,n)| m = even, n = odd}, 

Si = {A r (m t n)\ m = odd, n = even}, 

S t = {A T [m,n)\ m = odd, n = odd}. (ll) 

Then, it is easy to show that the matrix H can be decom- 
posed into 4 sub- matrices. As a results, (8) is divided into 
4 sub-equations, i.e., 

Hif< = 1 = 1,2,3,4, (12) 
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where f, is composed of the elements of Si, *i is a vectoi 
composed of the elements of $ corresponding to ft, respec- 
tively. 

But, each sub-matrix can be further simplified by ignor- 
ing the high frequency DCT coefficients, since dominant 
low frequency DOT coefficients axe sufficient to reconstruct 
the damaged block within tolerable degradation. Thus, if 
we calculate only v coefficients from each of the 4 sub- 
equations, then the dimension of each sub-matrix is reduced 
to v x v , resulting in a considerable saving in computational 
complexity. 

For example, in the case of v — 3, only 12 coefficients 
ahown in Fig. 2 are calculated. In this case, 4 sub-equationB 
axe given by, 



f £ 1 - f :» 1 



(11.) 
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where A mn = A r (m,n), and # mn = ^(m,n). 

4. SIMULATION RESULTS AND DISCUSSION 

For computer simulations, the input image is divided into 
8x8 blocks, and the transform coefficients are quantised 
with a uniform quantiser using the quantisation step-sise 
table recommended in JPEG [7]. The intensities of cor- 
rupted block are replaced with 128, which is a middle value 
of the image intensity. 

In Fig. 3, the PSNR performance according to the BLR 
on the Lena image are shown. Even at the high BLR as 
10~ l t it is found that the recovered images show high PSNR 
values. With the BLR of O(10~ 3 ), the recovered images are 
hardly distinguishable from their error-free images. In Fig. 
4, the corrupted Lena image with a BLR of 5 x 10~ a is 
shown. And, the Fig. 5 showB the reconstructed image of 
Fig. 4 by the proposed algorithm. Except for the blocks 
having sharp edge or in successive damaged area, we can 
see that the reconstructed blocks are hardly recognised. 

Now we shall briefly discuss the computational complex- 
ity. The proposed algorithm requires fewer multiplications 
and additions, compared to other EC algorithms, such as 
[1], [4]. It can be Bhown that the computational complexity 
of [1] is 0(N 7 ). Also note that [4] is based on the complex 
fussy logic computations. On the other hand, the num- 
bers of multiplications and additions of the proposed algo- 
rithm can be obtained by examining (7) and (12), which 
are Av(AN + u), and 4v($N + u — 2), respectively, For ex- 
ample, in the case of H = 8, and v =: 3, the number of 
multiplications and additions are 420 and 760, respectively. 



5. CONCLUSION 

A new EC technique for DCT based image coding has been 
presented in this paper. The proposed algorithm recovers 
the damaged blocks based on the smoothness property of 
an image at the boundaries of the blocks. In our approach, 
the inters ample variations between adjacent blocks are de- 
scribed by an object function, and the DCT coefficients 
which minimise the object function are obtained by solving 
a linear equation. Furthermore, we have shown that the lin- 
ear equation could be decomposed and reduced into more 
simple sub-equations. Thus, the computational complexity 
of the proposed algorithm is very low, compared to the ex- 
isting techniques. The simulation results indicate that the 
proposed algorithm recovers the damaged blocks, even if the 
BLR is as high as order of 10~ 3 . Further employment of the 
proposed algorithm includes ATM layered coding and full- 
motion video applications, such as the H.261 and MPEG. 
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ABSTRACT 

We review first the development of tree-based, embedded 
image coding. We then explore their use with different 
image transformations, the reasons for their effectiveness 
and low complexity, their flexibility and incorporation into 
state-of-the-art compression systems. 

1. INTRODUCTION 

Six years after their introduction, tree-based algorithms for 
embedded image coding established a firm reputation as 
some of the most powerful, efficient and versatile methods 
known for image compression. They are now studied by a 
very active group of researchers, and this paper presents a 
short survey of the state-of-the-art in the field. 

Interestingly, those compression methods initially faced 
a certain skepticism. They were able to support very desir- 
able features, like embedded coding, fast compression and 
decompression, very precise distortion or rate control, sup- 
port for lossless compression, great scalability, no need for 
training, etc. At the same time, they had a performance 
superior to even the methods painfully adapted to support 
only one (or a few) of those features. For some, it seemed 
definitely too good to be true. 

One possible reason for this skepticism was that they 
use two techniques that had been used for some time, but 
which had been yielding mediocre results. The first tech- 
nique is tree-based recursive partitioning schemes — as em- 
ployed in quadtree image compression [26]. The second is 
bit-plane [13], which permits progressive image transmis- 
sion with gradual quality improvement. 

The origins of these methods can be traced back to when 
subband/wavelet [30, 31] image coding methods began to 
gather wider acceptance, as even very simple schemes could 
produce remarkably good visual quality, together with pro- 
gressive image transmission. Lewis and Knowles [9] were 
the first to use trees to exploit the statistical properties 
found in the pyramidal decomposition of natural images. 
They proposed an efficient decomposition and coding scheme 
using tree structures that follow the same spatial location 
across different subbands, which we call spatial orientation 
trees. 

Then, Shapiro [21, 22] proposed a very clever method 
to combine bit-plane coding, applied to the wavelet coef- 
ficients, with a tree-based partitioning similar to the one 
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developed by Lewis and Knowles. This combination, called 
embedded zerotree of wavelets (EZW), is by no means a 
trivial juxtaposition of methods, It identified that effi- 
cient compression required separately coding the signifi- 
cance data (to identify if a pixel magnitude is larger than a 
threshold) and the rest of the bit planes. Further, it shows 
how much tree structures are efficient to code significance 
data of wavelet image pyramids. 

Shortly after the EZW introduction, we presented a 
more general analysis of the algorithm [14, 15], relating it to 
some selection (sorting) and set-partitioning problems. We 
also proposed some changes that improved its performance, 
and have shown how to use it for lossless image compres- 
sion [16, 18]. Later we studied how to improve the method 
to achieve better embedding, faster coding and decoding, 
simpler and more efficient entropy-coding, and less mem- 
ory usage. The result was a new algorithm, called SPIHT, 
that indeed achieved all those goals [17]. 

In this paper we present a cross section of what hap- 
pened in the field recently. Due to lack of space the survey 
is not supposed to be comprehensive, and we apologize for 
all missing references. Also, we had to exclude important 
work that is embedded but not tree- based, and tree-based 
but not embedded, extensions to video, etc. 

2. USE WITH DIFFERENT IMAGE 
TRANSFORMATIONS 

One common misconception about tree-based embedded 
methods is that they work only with wavelets. They were 
indeed designed to exploit special properties of the wavelet 
pyramid coefficients, under a particular tree structure. None- 
theless, a number of experiments show that they are quite 
effective for other transformations as well. For instance, 
they were sucessfuly employed together with 8x8 DCTs, 
and even better results are obtained with 16 x 16 DCTs, 
[32, 11]. 

Among several exciting developments we have schemes 
for wavelet packets [33] and general lapped block trans- 
forms [28]. The lifting scheme [27] allows the develop- 
ment of better transforms for lossless compression [1], and 
wavelets on manifolds [20]. Tree-based embedded coding 
methods are readily adapted anywhere the lifting scheme 
can be used. For example, an interesting application is effi- 
cient (and embedded) compression of functions defined on 
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Figure 1: Performance of different preogressive trans- 
mission methods when used for embedded coding. 

the surface of sphere [6]. 

While reading this paper the reader should keep in mind 
that we are discussing a coding algorithm, and as such, its 
performance (compression efficiency, memory usage, recon- 
struction visual quality, etc.) depends on the image trans- 
formation being used. 

3. OPTIMIZED EMBEDDED CODING AND 
NON-EMBEDDED CODING 

Shapiro [22] correctly defined his embedded coding with two 
objectives: "(I) obtaining the best image quality for a given 
bit rate; (2) all encodings of the same image at lower bit 
rates are embedded in the beginning of the bit stream for 
the target bit rate.* Without the first condition, almost 
any coding method can be called embedded, but possibly 
with dismal performance at intermediary rates. For exam- 
ple, Figure 1 shows the performance 1 of SPIHT [17], com- 
pared to a method designed for progressive resolution [19], 
adapted for embedded coding. 

Unfortunately, now the literature is filled with methods 
using the expression embedded coding incorrectly. In fact, 
image pyramids are sometimes called "embedded wavelets," 
so that any coding method using them is also called embed- 
ded. 

It is important to observe that bit-plane embedded cod- 
ing does not exclude the possibility of using other forms of 
progressive transmission. For instance, with the proper file 
format it is possible to recover subbands independently. An 
impressive application based on this fact is a program that 
allows real-time enhancement — up to lossless recovery — in 
an user-selected region [4]. 



! PSNR = 10log l0 (255 2 /(niean squared error)). 



Embedded coding necessarily adds some extra complex- 
ity to the coding algorithm, as bit planes are sequentially 
visited, and it is true that embedded coding methods may 
not be as fast as the fastest non-embedded methods. How- 
ever, embedded tree-based methods are not much worse ei- 
ther. First, it should be noted the computational effort 
spent in the first passes is insignificant compared to the ef- 
fort in the last pass. So, it is not too different from a single- 
pass algorithm (non tree-based do not have this property). 
Second, it is not a fair comparison, as there are "shortcuts" 
that can be taken when embedded coding is not required. 

4. THEORETICAL ANALYSIS 

The development of tree-based embedded coding methods 
followed an unusual path: a rigorous theoretical basis is be- 
ing discovered many years after the practical methods were 
devised from empirical analysis of distributions in wavelet 
pyramids formed from natural images. Little was known 
about which images would be "natural" enough to yield 
similar results. In the same token, little was known about 
how to use these methods with different image transforma- 
tions, like DCTs. 

Davis and Chawla [5] proposed a method based on dy- 
namic programming to find the optimal tree structures (set- 
partitioning rules) for different transformations (like DCTb). 
Surprisingly, the variation found among different trees can 
be quite small, showing that tree-based algorithms are by 
nature quite robust. 

New work [25] model the dependence that exist between 
the magnitudes of wavelet coefficients in different scales, 
and orientation. This phenomenon originated the use of the 
expression "self-similarity" to explain the efficacy of tree- 
based coding. However, it was never used in [21, 22, 15, 
17] to mean fractal self-similarity: it meant similarity in a 
distribution sense. So, in contrast with theories based on 
conceptual analysis [8], the statistical models above provide 
not only an explanation to the success of tree-based coding 
with wavelets, but also to other transforms. 

5. COMPUTATIONAL COMPLEXITY 

The speed of bit-plane embedded coding methods is defined 
by the need to perform several passes on the image, com- 
paring magnitudes each pass. The tree-based methods have 
an advantage over other methods because the necessary in- 
formation is stored in linked lists, which are, for the most 
common bit-rates, much smaller than the total number of 
pixels. Experiments show that most of the CPU time is 
spent testing for significance, and a small fraction is used 
for pixel value refinement. 

The encoder needs to first compute the maximum mag- 
nitudes of the pixels in the trees, but this can be done with 
very efficiently with a single pass of binary- OR operations 
done during the image transformation [23]. So, this pass 
through the whole image is computationally equivalent to 
a single pass of other bit-plane algorithms. 
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Next, both the encoder and decoder have to maintain a 
dynamic list of coefficients. Shapiro's EZW algorithm [22] 
adds four descendants to the lists whenever a parent or de- 
scendant is significant. One improvement is to add descen- 
dants only when at least one descendant is significant [14, 
15], This yields a very significant reduction in the number 
of elements in the lists, with proportional reduction in CPU 
time and memory required for list processing. 

The next speed bottleneck was the requirement of us- 
ing arithmetic coding for effective compression. The follow- 
ing breakthrough was the proposal of a different — and even 
more compact — tree set partitioning [17], which would keep 
more insignificant descendants grouped together. This rep- 
resentation was efficient enough to yield excellent results, 
even without any form of entropy-coding. Furthermore, it 
allows dealing with groups of 2 x 2 pixels/trees, for another 
significant reduction of list management resources. 

The first implementations had nearly equal encoding 
and decoding times, but this was due to the arithmetic 
code. With faster entropy- coding the decoder can be quite 
faster because it does not require computation of maximum 
magnitudes. 

Those advanced new coding algorithms yielded such 
large reduction in the coding/decoding complexity, that, in 
a software implementation, coding/decoding became much 
faster than the image transformation. Even though it is fre- 
quently thought that list processing is a complex task, it is 
a well known programming problem, and thus has been ex- 
tensively studied and optimized. Consequently, tree-based 
decoding algorithms can be faster than any other algorithm, 
because they process only a small fraction of the image pix- 
els. 

Significant improvements have been proposed for spe- 
cialized architectures, and parallel processing [2], A method 
to exploit embedded coding to reduce the complexity of the 
wavelet forward and reverse transformation is proposed by 
Paris et al. [12], 

6. RESILIENCE TO TRANSMISSION ERRORS 

Error control is done by adding redundant bits, or by asking 
for retransmission; This cause an overhead on the bandwith 
that depends on the error rates, and the efficiency of the 
protection method. Transmission errors can drastically re- 
duce the effective bandwidth if improper error control pro- 
cedures are used. For instance, assuring no data errors all 
the time is indispensable for text or database data, but it 
is not necessary for still images and video. This happens 
because we can still visually gather most of the information 
if only some parts are degraded. However, in all efficient 
compression methods an error in the compressed data can 
lead to a possibly long sequence of errors, or a catastrophic 
loss of all data after the error. 

As explained above, embedded coding is, by definition, 
based on sorting the image components according to their 
importance. This can be extremely useful when designing 
error control systems, as error protection resources can be 
assigned according to the importance of the data. 
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Figure 2: Fraction of bits that do not need error pro- 
tection, according to embedded coding method. 



Bit-plane coding helps in two ways. First, it guarantees 
that only the data below a certain plane is corrupted after 
an error. Second, the least significant bits do not require 
entropy coding. So, for those bits it is not necessary to 
use error protection because without variable-length coding 
there is no catastrophic error propagation. 

On the other hand, all error in significance data do lead 
to catastrophic error propagation. The significance data 
can be protected with common error-correction codes. An 
interesting alternative it to slightly change the coding algo- 
rithm [10] to avoid error propagation. Figure 2 shows the 
fraction of bits that do not lead to error propagation in a 
bit-plane compressed file (assuming a previous wavelet de- 
composition), using the original and modified coding meth- 
ods. 

Several other ingenious solutions have been proposed to 
improve resilience to transmission errors, like special trans- 
mission protocols, adapted algorithms, combination with 
error correction, etc. [3, 24, 29]. 

7. CONCLUSIONS 

We have traced the development of tree-based, embedded 
coding techniques to Lewis and Knowles [9] and Shapiro [22] 
and have shown their current realization in SP1HT coding 
and several variants to accommodate more features, such 
as resilience to channel errors and region- of-interest coding. 
It is apparent that, due to the high performance, low com- 
plexity, and versatility of these techniques, they represent 
the current state-of-the-art in image coding. 
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